Reasons for Systemic Bias in Machine Learning for Policing and Other Applications
Recently, Microsoft, Amazon and IBM all agreed not to sell their facial recognition technology for law enforcement use over the next year amid findings of racial bias across many such systems. Much press has pointed to the bias in such tools to suggest that they should be banned entirely from police usage. Personally, I disagree with this binary approach. The reality is that developing and using this technology requires certain philosophical tradeoffs. As someone that works with machine learning, I’d like to shed some light on how and why this bias is introduced into such algorithms and suggest practical approaches to adopting this technology fairly and justly.
Before discussing the algorithms themselves, I’ll affirm the philosophical constraints that any machine learning application must solve, namely that people of different protected classes should be treated equally and that we should try to maximize benefits for society as a whole. These two constraints are unfortunately fundamentally at odds with each other in any society where protected classes are not equally represented in society (as they usually aren’t). Under the “no free lunch” principle, having a more fair algorithm will mean that it will be less effective for other members of the population and one that is generally more effective for a larger group of the population may mean it is less fair for different classes. Such trade-offs can be easily illustrated by the choices we make during data selection.
Among other factors like architecture choice and initialization parameters, training data is one of the largest factors in bias. A generative neural net that falsely attributes a low-resolution picture of Obama to a white man was most likely trained on a dataset with more white men. Similarly, Native Americans were found to have the highest false-positive rate in a study on facial recognition systems, which can probably be attributed to a lack of data on Native Americans. Unlike data from white men, data on Native Americans is likely to be more uncommon or costly as they are about 2.9% of the entire population of the US. For any practical company with a limited budget, this would mean a tradeoff between a large amount of varied data on white men, or a smaller set of data on Native Americans. Improving performance on Native Americans would hurt performance on white men, a larger proportion of the population. Hence, there is “no free lunch”.
From a pure optimization standpoint, the right approach can be anywhere on the spectrum of having all protected classes be treated equally poorly or having performance reflect the distribution of the population, disproportionally benefitting non-minorities.
Of course, choosing a particular location on this spectrum may differ depending on the specific application. Bias in high-risk environments such as policing or cancer screening may have disastrous consequences, which may cause us to shift the requirements closer to fairness than to global efficiency.
Establishing the right tradeoff can be difficult, but I believe it is possible to do so using a careful approach even in those high-risk environments. For example in the policing case, we could establish a standard of service where people of all protected classes must be able to be recognized at a certain degree of confidence, but be able to provide higher confidence when possible. If we arbitrarily say that the standard is 99%, this means that there would be 1% of the local population that may be confused as you. However, 1% of the population of a large city like New York is different than 1% of the population of a local town such as Kansas. If the algorithm says there’s a higher confidence, then that 1% may become 0.1%, and similar calculations can be made for how many similar people there would be in a local population. It is then up to the local law enforcement agencies to understand and utilize such insights.
At the end of the day, all technologies, including ones involving machine learning, are only intended as tools. Human beings will the ones that hold the responsibility for the final decision. As such, I believe it’s our responsibility to set where the bar should be for fairness and bias, and our responsibility to ensure that they are applied correctly in local law enforcement agencies and elsewhere.