Evan Wagner
You are a prisoner being considered for parole. After your past is scrutinized in great detail across two hearings, the board informs you that you’ve been denied. Why, you ask? The board tells you that they
ran your information through an algorithm called Correctional Offender Management Profiling for Alternative Sanctions (COMPAS). It gave you a recidivism risk score of 7, indicating that you would be highly likely to commit another crime if you were released.
Computers trying to stop crime before it happens may sound like a science fiction premise, but COMPAS and similar algorithmic tools have been used by prison systems across America for about two decades. They arrived at a time when the ability of judges to overcome their personal biases was under scrutiny. Studies showed wide variance in the severity of judicial sentences, in large part because of factors unrelated to cases at hand, right down to how recently the judge ate lunch.
Aghast at such arbitrary injustice, the legal community embraced rigid sentencing guidelines and new technologies which promised to keep judges in check. If bias is an individual ailment, they reasoned, its cure is to curtail discretion and ensure rules apply to everyone equally. The rhetoric of “data-driven practices” filled discussion.
Twenty years later, we seem to have realized that bias can travel far beyond isolated brains and express itself strongly in manmade systems. A broad set of journalists and academics have turned the interrogation lamp toward data analysis in criminal justice, uncovering some unsavory insights. An infamous ProPublica investigation found that, when assessing Black defendants, COMPAS produced twice as many false positives as false negatives, meaning it was twice as likely to overestimate the chance that they would reoffend if released than underestimate it. For white defendants, the reverse was true: COMPAS was twice as likely to lowball their risk scores than overshoot.
We tend to see the act of data analysis as an “objective” exercise, a measurement of reality unrelated to the analyst’s desires and emotions. Collecting data and processing it with mechanical brains will, we think, free us from the “mistakes” baked into our own brains. This assumption is an even bigger mistake than our mental shortcomings.
For one thing, since humans are still guiding the learning process to machines, we cannot help but put our biases into them. Examples abound, including Google Translate gendering verbs based on stereotypes, an Amazon hiring algorithm penalizing women for their gender, and, of course, COMPAS’ higher false-positive rate for Black defendants.
For insight, I spoke with Julia Dressel, a software engineer at Recidiviz. “A core assumption of any machine learning model is that the future is going to look like the past,” explains Dressel. “When you have a tool that is built on historical data, you are inevitably going to reproduce historical discrepancies and inequalities.” Demonstrating the concept, the Los Angeles Police Department (LAPD) used models trained on historical crime data to map out their patrols. As a result, they overpoliced the same minority neighborhoods that had been consciously overpoliced for decades.
As the LAPD story shows, the notion of putting data in the “driver’s seat” is absurd. Data is more like a GPS: the human driver decides where to go, and data tells us how to get there. While “objectivity” is no doubt desirable for the data’s task, it is a purely subjective exercise to figure out whether the outcomes we seek are truly good. Do we simply want the police to make the most arrests possible, or is their mandate a much more general goal—to maintain safety and order—that cannot be quantified with one or two neat variables?
It will always be our job to drive and data’s job to navigate. Whenever we fool ourselves into thinking we have given up the wheel, our unexamined values take it. The American disdain for criminals, long ago observed by Alexis de Tocqueville, turned COMPAS into a vehicle for mass incarceration. Societal norms shape the implementation of new technology, well-evidenced by the impact of local political culture on the adoption of neural networks worldwide. In the U.S., a handful of corporations have gathered control of the most powerful neural networks and use them to predict consumer behavior. Meanwhile, in China, the most advanced technology is incorporated into a growing system of centralized social control.
It is incumbent on us to develop the tools of data analysis with a conscious regard to our intentions when we use it. As things stand, very little of our analytical capacity is devoted to social causes, but a handful of non-profits and think tanks are emerging to rise to the task, including Recidiviz. Founded in 2017, the organization partners with state governments to help them reduce their prison populations, improve their operations, and turn their criminal justice goals to reality.
Whereas COMPAS is wired to optimize individual punishment, narrow-minded and liable to overincarcerate, the descriptive statistics provided by Recidiviz give state officials a broader perspective of the institutions they oversee. “It sounds really simple,” says Dressel, “but governments are usually working with very minimal data analytics support. So being able to measure and count the things that have happened and are happening in a state’s criminal justice system is super powerful.” Data measures a tiny fraction of our world, and what we choose to measure reflects what we deem important.
The desire to use data fairly is a good start, but as the COMPAS debate illustrates, deciding what fairness looks like is another, more difficult matter. Its creators defend COMPAS on the grounds that its error rate is the same across all racial categories. This is true, but there are two types of error it can make: it can classify someone as high risk who does not go on to reoffend (a false positive), or it can classify someone as low risk who later reoffends (a false negative). ProPublica found racial bias not in COMPAS’s misclassification rate, but in the ratio of its false positive rate to its false negative rate.
Would balancing multiple definitions of fairness be more ideal? Unfortunately, it is a mathematical truth that models cannot be optimized for two variables at once. COMPAS cannot be optimized for both overall accuracy and a balanced error distribution. The model could certainly be reworked to dispense an equal number of false positives and false negatives for each racial group. But overall accuracy would necessarily be degraded in the process. Since the population of Black Americans has a higher recidivism rate compared to the population of white Americans, any COMPAS-like model that minimizes its error rate is bound to overestimate risk for Black defendants and underestimate risk for white defendants.
If justice is the fewest errors made, then COMPAS was developed to be as just as possible. But there is a strong argument that, since our justice system guarantees equal treatment for individuals, any one person facing judgment should have an equal chance of misclassification in either direction. Which fairness metric is appropriate for the situation?
The higher recidivism rate observed among Black inmates does not arise from inherent racial qualities, but centuries of discrimination and disenfranchisement from the civic sphere. That said, if the model were changed to optimize for equal individual fairness across racial groups, thousands of defendants per year would be incorrectly classified when the original COMPAS would have made the right decision.
Perhaps using this sort of prediction is such a way is a problem in itself. To estimate a person’s recidivism risk is basically to stereotype, to categorize people and assume individual behavior will match the broader group. We are obviously and rightfully uncomfortable with assuming a person’s behavior or intentions from their race. Yet racial disparities create correlations between race and many aspects of life. To name a few: wealth, income, education, and the likelihood of prior contact with the justice system, to name a few. COMPAS input data does not have a column labeled “race”, but many of its variables are significantly correlated with race. Should we really call that “colorblind”?
Moreover, does the state even have a right to restrain citizens based not on their own actions or words, but merely on the actions of similar people before them? The foundational theorists of liberalism agree that a democratically legitimate government has the prerogative to punish those who have already broken the law. However, discussion of what might be called preemptive prerogative is hard to come by. Perhaps the government can act legitimately on knowledge of a premeditated plan to break the law, but to predict a person’s general future behavior is a different thing. Defendants are presumably not strategizing about their next offense when COMPAS processes their personal information and makes a prediction. Such contradictions in the American legal system leave it unprepared for a world of increasingly accurate statistical models.
With few apparent philosophical qualms, governments across the world are plowing ahead with the deployment of decision-making machines at massive scale. Despite public opinion’s turn against predictive algorithms, the LAPD does not appear to have adjusted their approach. Even more concerningly, China uses artificially intelligent judges on a massive scale to hear arguments and pass verdicts on civil cases. Estonia, a liberal democracy, does much the same thing in the name of “smarter government.” So far, the purview of such machines seems to be restricted to more routine, low-stakes cases, like product liability and financial disputes. But even this limited use is troubling, and even more so the possibility that it will expand to more consequential areas of law.
When developing and implementing statistical tools, it is important for government and its partners to recognize the role of their own value judgments rather than pretend they don’t exist. Demonstrating this principle, Dressel declares that Recidiviz “is an explicitly decarceral organization. Our mission is to help end mass incarceration.” Hopefully, more and more political institutions will take a similar approach as our data-driven world continues to shape. But perhaps the more important goal is to pass legislation that clarifies the acceptable role of data-based prediction in criminal justice and prevents tools like COMPAS from being used in the first place.⬩