The decision sciences (AI, machine learning, and other data sciences) are justifiably under scrutiny for bias. The topic became a legitimate and concerning issue when bias was statistically determined in United Health Care (Optum) coverage algorithms: Minorities suffered an unfair disadvantage for medical coverage. It was unintentional—nevertheless, the results were substantiated, peer-reviewed, and published in Science, a historically trusted and credible journal of scientific discovery. The Optum problem is an interesting cautionary tale that informs the Gaggle ECO™ practices (Evaluate, Constrain, Own) on how best to discover and mitigate bias in decision processes.
The Optum algorithm was built on the false premise that healthcare costs are a correlated measure of healthcare needs i.e., the algorithm predicted healthcare costs rather than illnesses. Minorities aren’t treated as often as white patients for the same level of conditions; however, this is largely due to unequal access to care. Unfortunately, this was interpreted as meaning that if less money is spent on healthcare, it’s because it isn’t medically necessary—a false presumption. It turned out that social inequity was accidentally interpreted as an individual’s response to illness; that some people, despite the situation, may not choose to spend money to fix their medical issues. This unintentionally perpetuated the inequity in the Optum algorithm, further restricting minority coverage and availability when it comes to healthcare. UHG is the largest publicly traded healthcare provider in the world, so this was a major deal.
At first glance, if one were to root out the cause of bias in a system, one might make the algorithm the first place to look. This was exactly where the issue was in the Optum case. But if one looks deeper at this, the Optum designers may have, with the best of intentions, overlooked the obvious and easy way to avoid this unintended bias: looking at the input/output data distributions.
The Optum designers were apparently striving to develop algorithms blind to race to be racially fair. Unfortunately, that approach backfired. They may have thought that ignoring racial labeling at the input would be the safest way to limit bias at the output; they then, of course, never validated whether their assumptions were correct at the output. If they had done this, the algorithm bias would likely have been discovered.
With this cautionary tale, how does Gaggle avoid these pitfalls? First, we evaluate the distribution of the data at the input of our decision algorithms to understand the input imbalance (the “evaluate” in our ECO AI Ethics process). We then validate to see if our output has a similar distribution (we ensure bias is “constrained” at the output as stated in our ECO AI Ethics principles). Applying these principles avoids the Optum problem.
If it’s been determined that there are racial imbalances at the input, it doesn’t mean the algorithm will have racial bias. One must understand what the distribution is at the input and should anticipate the same at the output. If it is a statistically significantly different distribution at the output, the algorithm should then be inspected. So, the first step is to understand what is going on at the input.
Olga Russakovsky, an Assistant Professor of Computer Science at Princeton University, believes in understanding input data imbalances. She says, “There’s no balanced representation of the world and so data will always have a lot of some categories and relatively little of others.” It’s not about turning a blind eye and avoiding identifying these imbalances; it’s about accepting, recognizing, and—most importantly—evaluating and constraining them at the input and output. She has dedicated her efforts to doing just that: She began rebalancing an image training resource, ImageNet, which contains ~14 million images.
“I don’t think it’s possible to have an unbiased human, so I don’t see how we can build an unbiased A.I. system,” said Russakovsky. “But we can certainly do a lot better than we’re doing.” And our ECO AI Ethics principles here at Gaggle, echoed synonymously by leading AI ethicists, are the best approach to do just that—to do better.