Last fall, University of Virginia computer-science professor Vicente Ordóñez noticed a pattern in some of the guesses made by image-recognition software he was building. “It would see a picture of a kitchen and more often than not associate it with women, not men,” he says.
That got Ordóñez wondering whether he and other researchers were unconsciously injecting biases into their software. So he teamed up with colleagues to test two large collections of labeled photos used to “train” image-recognition software.
Their results are illuminating. Two prominent research-image collections—including one supported by Microsoft and Facebook—display a predictable gender bias in their depiction of activities such as cooking and sports. Images of shopping and washing are linked to women, for example, while coaching and shooting are tied to men.
Machine-learning software trained on the datasets didn’t just mirror those biases, it amplified them. If a photo set generally associated women with cooking, software trained by studying those photos and their labels created an even stronger association.
Mark Yatskar, a researcher at the Allen Institute for Artificial Intelligence, says that phenomenon could also amplify other biases in data, for example related to race. “This could work to not only reinforce existing social biases but actually make them worse,” says Yatskar, who worked with Ordóñez and others on the project while at the University of Washington.
As sophisticated machine-learning programs proliferate, such distortions matter. In the researchers’ tests, people pictured in kitchens, for example, became even more likely to be labeled “woman” than reflected the training data. The researchers’ paper includes a photo of a man at a stove labeled “woman.”
If replicated in tech companies, these problems could affect photo-storage services, in-home assistants with cameras like the Amazon Look, or tools that use social-media photos to discern consumer preferences. Google accidentally demonstrated the dangers of inappropriate image software in 2015, when its photo service tagged black people as gorillas.
As AI-based systems take on more complex tasks, the stakes will become higher. Yatskar describes a future robot that when unsure of what someone is doing in the kitchen offers a man a beer and a woman help washing dishes. “A system that takes action that can be clearly attributed to gender bias cannot effectively function with people,” he says.
Tech companies have come to lean heavily on software that learns from piles of data, after breakthroughs in machine learning roughly five years ago. More recently, researchers have begun to show how techniques considered cold and clinical can pick up unsavory biases.
Last summer, researchers from Boston University and Microsoft showed that software trained on text collected from Google News reproduced gender biases well documented in humans. When they asked software to complete the statement “Man is to computer programmer as woman is to X,” it replied, “homemaker.”
The new study shows that gender bias is built into two big sets of photos, released to help software better understand the content of images. The researchers looked at ImSitu, created by the University of Washington, and COCO, initially coordinated by Microsoft, and now also co-sponsored by Facebook and startup MightyAI. Each collection contains more than 100,000 images of complex scenes drawn from the web, labeled with descriptions.
Both datasets contain many more images of men than women, and the objects and activities depicted with different genders show what the researchers call “significant” gender bias.
In the COCO dataset, kitchen objects such as “spoon” and “fork” are strongly associated with women, while outdoor sporting equipment such as snowboards and tennis rackets are strongly associated with men.
When image-recognition software is “trained” by examining these datasets, the bias is amplified. A system trained on the COCO dataset associated men with keyboards and computer mice even more strongly than the dataset itself.
The researchers devised a way to neutralize this amplification phenomenon—effectively forcing learning software to reflect its training data. But it requires a researcher to be looking for bias in the first place, and to specify what he or she wants to correct. And the corrected software still reflects the gender biases baked into the original data.
Eric Horvitz, director of Microsoft Research, says he hopes others adopt such tools as they build software powered by machine learning. The company has an internal ethics committee dedicated to keeping AI in the company’s products in line. “I and Microsoft as a whole celebrate efforts identifying and addressing bias and gaps in datasets and systems created out of them,” Horvitz says. Researchers and engineers working with COCO and other datasets should be looking for signs of bias in their own work and others’ he says.
Away from computers, books and other educational materials for children often are tweaked to show an idealized world, with equal numbers of men and women construction workers, for example. Horvitz says it may be worth considering a similar approach in some cases for material used to teach software about the world. “It’s a really important question–when should we change reality to make our systems perform in an aspirational way?” he says.
Others studying bias in machine learning aren’t so sure. If there really are more male construction workers, image-recognition programs should be allowed to see that, says Aylin Caliskan, a researcher at Princeton. Steps can be taken afterwards to measure and adjust any bias if needed. “We risk losing essential information,” she says. “The datasets need to reflect the real statistics in the world.”
One point of agreement in the field is that using machine learning to solve problems is more complicated than many people previously thought. “Work like this is correcting the illusion that algorithms can be blindly applied to solve problems,” says Suresh Venkatasubramanian, a professor at the University of Utah.