Unbiased ≠ Fair: For Data Science, It Can’t Be Just About Math
As I have been thinking through the ethical implications in data science, one thing has become glaringly obvious to me. Data scientists like math! Nothing very surprising there. But as we go about our work building models and making great predictions, we tend to reduce the conversation about ethics into mathematical terms. Is my prediction for Caucasian Americans the same as African Americans? Female predictions equivalent to male? We develop confusion matrices and measure the accuracy of our predictions. Or maybe the sensitivity or the specificity is important, so we balance those for various subgroups. Unfortunately, mathematicians have shown that while we may be able to balance the accuracy, specificity, or other measures of bias, for real data sets, we cannot balance them all and make perfectly unbiased models. So, we do the best we can within the framework we are provided and declare that this model is fair.
After studying the issues and applications, I assert that models that balance bias are not fair. Fairness really does not pay attention to mathematics. It pays attention to individual viewpoints, societal and cultural norms and morals. In other words, fairness is defined by social systems and philosophy.
For example, in criminal justice, recidivism models predict whether a person arrested will commit another crime if released on bond. As an indicted individual, you believe that the false positive rate should be as low as possible so you are not kept in jail when you should not be. For the average citizen however, you want the false negative rate as low as possible to minimize the number of people that are let out and go on to commit a new crime. Balancing these two are a tradeoff that both sides will say is not fair. AND this does not even start to discuss the bias in the data and the system that has resulted in disproportionately higher numbers of African Americans being incarcerated.
As one considers the ethical implications of data science, one quickly gets to debating the cultural and moral norms of the society that the model is being deployed into. As a data science team deploys a model, those cultural norms must be considered. Philosophies of utilitarianism and its derivatives are prevalent within Western society. Here, the role of overall good is debated and the balance between individual good and common good is discussed. Other philosophical constructs are favored in different cultures and geographies. Understanding how, where and what cultures a model will touch is important to reaching for fairness for the deployed model.
Understanding the system within which a model is deployed is just as important. As models are deployed, they enter an operational system. Depending upon the specifics of the situation, there are often decisions that are made after the model prediction. Data scientists often develop and measure model accuracy based upon what mathematics predicts. However, measurement of the entire system and decisions that occur AFTER the model prediction is just as important. Additionally, human-in-the-loop models are often held up as even more accurate; however, are they also less biased and fairer? With a human in the loop, bias may creep back into decisions. Also, if there is not a single decision maker, different people bring different levels of information as well as cultural differences. Each of these differences can easily result in system bias and fairness issues even if the model was tuned and prepared to be as fair as possible. Framing the operations and measuring performance should occur for both the model outcome as well as the system outcome. I believe many lawsuits over fairness and discrimination occur because both sides frame the situation differently. Each side is “right” within their framing, but which frame will a jury conclude to be fair?
As responsible data scientists, we should expand our ethical considerations beyond the mathematical bias of our model to include cultural and societal norms, and our model deployment should consider framing the outcomes of the system and not just the model predictions.