There has been much discussion in recent years about bias in artificial intelligence. This is also familiar to us as an old problem with human intelligence. What is our best hope?
First, let’s be clear about how bad the problem is.
There have been many academic articles, podcasts, and news reports recently about bias in artificial intelligence. Knowledge of the problem has gone mainstream. The main cause discussed is a statistical bias in training data:
- If most training data is for white men, then the algorithm that results will do a much better job on white men.
- Seeing how good of a job the algorithm does for white men makes people trust it and ignore the problems it causes women and racial minorities.
Once the algorithm is biased, the artificial intelligence can amplify it’s own bias, as Zhao et.al. explain in their 2017 paper Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints:
Models trained on these datasets further amplify existing bias. For example, the activity cooking is over 33% more likely to involve females than males in a training set, and a trained model further amplifies the disparity to 68% at test time.
With Artificial Intelligence amplifying the prior social bias, this risks amplifying the human bias itself. Parikh et.al. discuss this in the context of medicine in their December 2019 article Addressing Bias in Artificial Intelligence in Health Care:
Perhaps of more concern, clinicians may be more likely to believe AI that reinforces current practice, thus perpetuating implicit social biases.
It’s a real, big problem.
Unsuprisingly, bias didn’t start in the twenty-first century. Humans are very good at pattern matching too, and we tend to be around people like us (for some definition of “like us”) most of the time, which gives us a really biased statistical set of anecdotal evidence. Furthermore, when we gather data for studies we are often biased in the data we collect, so while analyzing large volumes of data gives more accurate conclusions, we are prone to under-estimate the effect of our bias in the data collection. Finally, when faced with the few outliers in our data from the group we’ve undersampled, we can have a social bias that disregards them as “not normal”, ie. not normal given our natural statistical bias.
Once some of this bias is built-in to the system and we “all know it”, and we have studies that “show it”, then of course that makes these notions hard to dispell.
While it’s true that we have to be skeptical about our “proven” assumptions to be good scientists, it’s also true that if we constantly reevaluate all of those assumptions then we would never progress our knowledge. Thus, it’s crucial to science to accept “proven” facts and build on them, and also to periodically revisit them.
Again, this is not new. Every major scientific discovery of the last thousand years is in some form someone reconsidering some prior assumption. Meanwhile, the rest of us have all reconsidered far more assumptions without receiving Nobel prizes. So we know that reconsidering our “confirmed” assumptions is usually useless and ocassionally critical.
Let’s consider just a single example of human intelligence bias from recent years which was critical to rethink: How to diagnose people who show up at the emergency room and might be suffering from a heart attack. Here’s a quote from Marek Glezerman back in March 2011 from the article Sex and Sickness:
“The classic picture of a heart attack,” he continues, “is a man clutching his left side and doubling over in intense pain that radiates to the shoulder and arm. But for one out of five women, the symptoms of a heart attack are totally different: the attack develops very gradually and not all at once. The woman complains of shortness of breath, the pain can radiate to the back of the neck, to the back or the jaw. And by the time she gets to the emergency room, the risk that she will be sent home with a diagnosis of hysteria is four times greater than for a man.”
Since 80% of women experienced similar symptoms to men, it was easy enough to ignore the “outlier data”, even though that “outlier” group is huge. How huge?
A recent statistic says: “Every year, 805,000 Americans have a heart attack.” While women tend to get medical attention before men and thus end up in the emergency room less frequently, it seems reasonable to assume that we’re talking about an order of magnitude of 100,000 people in the USA alone, per year, who risked getting misdiagnosed in emergency rooms for just this one reason. I’m way out of my personal area of expertise when we’re talking about medicine, so instead assume it’s 10,000 people who risked misdiagnosis in the US each year. Whatever the number is, it’s bad.
That’s merely one example of the horrible reprecussions from bias in human intelligence that has been with us for millenia.
Hopes of combating bias
So far the story sounds very depressing. Both “amazing new” artificial intelligence as well as “good old” human intelligence seem prone to bias, to amplifying their bias, to their studies then reinforcing prior social bias, and to causing horrific outcomes for millions of people.
How can we improve?
Consider the example of the 20% of women with different symptoms for their heart attacks. As a species we’ve identified this particular bias, and I assume that most places in the world are now accounting for it, since it was publicized about a decade ago. What if before it had been identified, artificial intelligence had been trained on large volumes of emergency room data including the eventual outcomes and post-facto diagnoses? Is it plausible that AI could have clued us in to a non-trivial percentage of the overall population getting false negative diagnoses?
My intuition is that a world with AI might have noticed this particular human bias much quicker than we did in our world with human intelligence. We can easily imagine not only women’s medicine advocates training AIs, but also alliances of profit-motivated insurance companies, hospitals, and government organizations. Parikh et.al., in the same December 2019 article I quoted above, wrote:
Although much of the discussion about AI and bias has focused on its potential for harm, strategies exist to mitigate such bias. When applied correctly, AI may be an effective tool to help counteract bias, an intractable problem in medicine.
One key phrase here is “when applied correctly”. That is, artificial intelligence can correct bias in human intelligence only when we properly train and configure the AI. That’s where the “correct” human intelligence fixes the bias in the AI.
In May 2018, Susan Leavy wrote in Gender Bias in Artificial Intelligence: The Need for Diversity and Gender Theory in Machine Learning:
Advancing women’s careers in the area of Artificial Intelligence is not only a right in itself; it is essential to prevent advances in gender equality supported by decades of feminist thought being undone.
This article points out biases inherent in large data sets that are available today, from a root of centuries of unrelated bias, and in this conclusion sentence points out that humans can use human intelligence to notice that bias and correct for it.
There’s also some early research going on into training AI to assist in humans noticing AI bias, so that the humans can then correct the AI training data or algorithms to fix that bias. In a July 2018 Nature article, AI can be sexist and racist — it’s time to make it fair, Zou and Schiebinger write:
A complementary approach is to use machine learning itself to identify and quantify bias in algorithms and data. We call this conducting an AI audit, in which the auditor is an algorithm that systematically probes the original machine-learning model to identify biases in both the model and the training data.
I’ve gone from the “doom and gloom” of artificial and human bias wreaking havoc, to the “rose-colored glasses” view that human intelligence can fix bias in artificial intelligence, which in turn can enable artificial intelligence to fix bias in human intelligence at a rate that we would not have dreamed of in the absence of AI.
So which is correct - pessimism or optimism?
Kate Crawford wrote in a June 2016 New York Times article entitled Artificial Intelligence’s White Guy Problem:
Like all technologies before it, artificial intelligence will reflect the values of its creators. … we risk constructing machine intelligence that mirrors a narrow and privileged vision of society, with its old, familiar biases and stereotypes.
I want to focus on two phrases from this quote: “like technologies before it” and “we risk”. I think that these phrases allude to two key points:
- AI is a technology, and can be used to benefit or harm.
- AI brings with it a risk to make bias worse and a promise to counteract our pre-existing bias.
If true, then it would seem that AI is leaving our human moral equation intact: We have to be vigilant to use our tools wisely and for good.