
The Idaho State Legislature is taking up the issue of fairness in algorithmic support for pretrial risk assessment. H.B. 118 prohibits the use of pretrial risk assessment algorithms by the state unless they are (1) shown to be free of bias and (2) transparent and subject to public audit.
Transparency
Let’s start with the good. First, I am very glad that state lawmakers are taking an interest in technological “enhancements” to our justice system and how they may introduce or perpetuate bias and discrimination. The bill’s sponsor, Rep. Greg Chaney (R-Caldwell), wrote an op-ed articulating his desire to see Idaho at the forefront of ensuring these systems, if they are used, are not discriminatory against members of protected classes.
Second, I enthusiastically support the bill’s requirement that the design, data, and validation protocols for these systems be made available for public scrutiny and audit. Both defendants and interested members of the public must, in my opinion, be able to investigate and challenge algorithmic tools that are deployed as a part of legal proceedings. When I teach Introduction to Data Science, one of the assignments I give my students is to replicate the ProPublica COMPAS analysis cited by Rep. Chaney and to engage with both ProPublica’s coverage and the vendor’s response. I would love to have my students apply the principles from class to our local context to verify (or refute) the fairness validations the bill requires for tools used by the state, and this transparency requirement looks like it would make the data available for us to do this.
I have some questions about precisely what must be made available, and there may be some privacy challenges if models are trained on non-public data, but the idea is sound. I also don’t know enough about how this is going to play out politically to know whether the likely outcome of these challenges are dead-ending the bill, weakening the bill, or banning the technology.
The bill also specifically bans vendors from claiming trade secret protections to avoid discovery when their technology is used in criminal proceedings, a prohibition I think is fantastic and necessary for a far wider range of technologies.
Defining Fairness
I have concerns, however, about the fairness requirements of the bill. To start with, the definition of “free of bias” is likely to run into problems. The bill adds Section 19-1910(a) to the Idaho Code defining “free of bias”:
“Free of bias” means that an algorithm has been formally tested and shown to predict successfully at the same rate for those in protected classes as those not in protected classes, and the rate of error is balanced as between protected classes and those not in protected classes.
The proposed definition requires two things to be fairly distributed: predictive accuracy and error rates. This is, on its face, an eminently reasonable demand: we don’t want the tool to work better for one group than for others, and we don’t want errors disproportionately distributed.
Unfortunately, it is also impossible to satisfy. One of the fundamental theoretical research findings in algorithmic fairness, discovered by Chouldechova [2016] and Kleinberg, Mullainathan, and Raghavan [2016], is that you cannot simultaneously have balanced predictive accuracy and error rates in most settings.
In more formal statistical terminology, this bill requires three things to be balanced between groups:
- The positive predictive value, which looks like the best translation of “predict successfully at the same rate”; this is the fraction of people the system judges to be high-risk who would, in fact, recidivate.
- The false positive rate, one half of the rate of error; this is the fraction of those people who will comply with the pre-trial release conditions that the program classifies as high-risk anyway. This is probably easier to understand in a setting where the algorithm is being used to make accusations: it is the probability of being falsely accused given that you are innocent.
- The false negative rate, the other half of the rate of error; this is the fraction of defendants who will violate pre-trial release that the program classifies as low-risk. In the accusation setting, it’s the probability of getting off the hook given that you are guilty.
These values come from the confusion matrix, a standard tool for understanding the successes and failures of decision-making processes. Both halves of the error rate are important: it’s bad if you are more likely to be falsely accused of a crime because of your skin color, and it is also bad to be less likely to get away with a crime because of your skin color.
Chouldechova, Kleinberg, Mullainathan, and Raghavan found that a risk assessment tool can only be fair on all three metrics simultaneously if the base rates of true risk are equal between the groups. In the data analyzed by ProPublica, there is a difference in base rates: the recidivism rate for black defendants was 51% compared to 39% for white defendants (although the fairness of the data sample should certainly be scrutinized carefully).
We therefore need to carry out a value-driven tradeoff evaluation to decide how to weigh the different fairness criteria to adjust, evaluate, and possibly reject the model. This is the heart of the dispute between ProPublica and Northpointe on the validity of ProPublica’s findings of racial bias in COMPAS: ProPublica found discrepancies in false positive and false negative rates, while Northpointe argued that equal positive predictive value meant their system was fair.
Next Steps
As written, it looks like it is impossible for any tool to satisfy H.B. 118’s definition of “free of bias”. It’s also unclear to me, as a statistically-oriented computer scientist, what it means to have “formally validated such assessment as having been free of bias”, as the law requires for state deployment of pretrial risk assessment algorithms. The current state of the bill and science would effectively prohibit the use of such algorithms; this isn’t necessarily a bad outcome, but I’d like to see some better clarity here as to what, exactly, the state wants to require for validation and how we want to navigate the inherent tradeoffs in calibrating these systems if they are not to be banned entirely.
I am very excited by the transparency and data access requirements of this bill. I would like to see similar language enacted to cover a much broader range of systems — when we deploy new technologies, particularly ones that can be as inscrutable as machine learning, we need the means and data to keep them accountable to the people.
I’m going to be watching this bill very closely. There are definitely things to very much like in it, and I’d like to reiterate how glad I am that state lawmakers are taking this issue seriously.