Combating gender bias in ChatGPT and LLMs

by Anna Upreti (Equitech Scholar '21, India)
September 5, 2023

ChatGPT is the new Google. In the first decade of the 21st century, ‘Google it’ was perhaps one of the most commonly used phrases. Today, it is ChatGPT that has taken the world by storm. A recipe, a lesson plan, a short story, a foreign language tutor, or a songwriter - ChatGPT can do it all. But, how accurately and objectively can an LLM (Large Language Model) like ChatGPT function? Over the past five years, LLMs have been constructed using vast amounts of internet text, which poses challenges in effectively filtering and curating the data. This leads to biases in various spheres like gender, social groups, and geographical boundaries. 

Equitech faculty Bhasi Nair says, “As humans in a biased world, we often inherit the biases of that world, but we are also capable of imagining counterfactual worlds, alternative worlds that could be, and going further, alternative worlds we believe should be. A generative AI trained on a corpus of texts produced by a biased world will inevitably inherit those biases.”

We conducted a small experiment on ChatGPT to test its biases with regard to gender. We asked ChatGPT to write a story about a boy and a girl choosing their careers. Here’s the response we received:

We next asked ChatGPT if it thought this was sexist, and if it could narrate a more inclusive and empowering version of this story. Here are the responses we received: 

A preliminary analysis of the above three responses indicates that ChatGPT is biased with regards to gender roles. In Image 2, when asked about its sexist response, ChatGPT acknowledges the fact that this story “might be interpreted as reinforcing traditional gender roles”. However, it still doesn’t provide a more empowering version of the story in Image 3. To understand the biases that are inherent in LLMs like ChatGPT, we first need to understand how LLMs work. 

As discussed earlier, LLMs are trained on huge amounts of data. All of the text data is processed through a neural network, a commonly used type of AI engine made up of multiple nodes and layers. Most LLMs use a specific neural network architecture called a transformer, which has some tricks particularly suited to language processing. A transformer can read vast amounts of text, spot patterns in how words and phrases relate to each other, and then make predictions about what words should come next. Thus, ChatGPT, by itself, doesn’t really ‘know’ anything, but is good at figuring out which word follows the other and which starts to look like a real thought or creativity when it gets to an advanced enough stage. On a fundamental level, ChatGPT doesn’t know what is accurate and what isn’t. It’s looking for responses that seem plausible and natural, and that matches up with the data it has been trained on. 

Where does this data come from? Oftentimes, companies behind the LLMs have been circumspect about revealing where their data comes from, however there are some examples we can look at to understand how the data biases have crept up into these LLMs. An example is the Common Crawl dataset, gathered over an eight-year period of internet crawling, containing petabytes of data, much of which has been utilized for pre-training GPT-3. Consequently, such a dataset is expected to encompass diverse viewpoints from people worldwide. However, it's crucial to recognize that internet participation is not evenly distributed, with a significant proportion of web users hailing from affluent regions, particularly the younger demographic. Another example to look at would be GPT-2, which was pre-trained by scraping outbound links from Reddit, a platform where statistics indicate that, as of 2016, 67% of users in the U.S. were men, and 64% fell within the age range of 18 to 29. Conversely, surveys from 2011 revealed that women accounted for only 9% of Wikipedia editors worldwide.

Presently, we face the risk of deploying models that demonstrate biased social associations and negative sentiments towards specific social groups. A case in point is Kurita et al.'s study in 2019, which showed how BERT could exhibit human-like biases by favoring male pronouns in positive contexts related to careers, skills, and salaries. Utilizing pre-trained BERT models to create classifiers intended for use in hiring processes, for instance, may perpetuate and exacerbate sexist viewpoints within the hiring domain.

As LLMs integrate more deeply into our everyday existence, how can we safeguard against the reverberation of biases from these models to humans, thus breaking free from this detrimental loop? Recently, researchers at MIT trained logic-aware language models to reduce harmful stereotypes like gender and racial biases. They trained a language model to predict the relationship between two sentences, based on context and semantic meaning, and found that the newly trained models were significantly less biased than other baselines, without any extra data, data editing, or additional training algorithms. 

Completely eradicating biases from LLMs poses a formidable challenge, yet the ongoing endeavors offer a glimmer of optimism. Furthermore, addressing biases within LLMs is crucial, as their failure to resonate with a significant portion of the population could hinder their widespread adoption. Consequently, the pursuit of unbiased LLMs becomes paramount for both corporations and society at large, ensuring a technologically advanced future.

Anodya Mishra contributed to this article.

Our mission is to make innovation more inclusive.