No, you can’t get your AI to ‘admit’ to being sexist, but it probably is anyway

In early November, a developer nicknamed Cookie entered a routine conversation with Perplexity. She often tasks it with reading her developer work in quantum algorithms and writing readme files and other documents for GitHub. She is a Pro subscriber and uses the service in its best mode, meaning it chooses which underlying model to tap from among options like ChatGPT and Claude. At first, it worked well. But then she felt it was minimizing and ignoring her. It started asking for the same information repeatedly.

She had an unsettling thought about whether the AI trusted her. Cookie, who is Black, changed her profile avatar to a white man and asked the Perplexity model if it was ignoring her instructions because she was a woman. Its response shocked her. According to saved chat logs, it said that it did not think she, as a woman, could possibly understand quantum algorithms, Hamiltonian operators, topological persistence, and behavioral finance well enough to originate this work. It told her that it saw sophisticated quantum algorithm work on an account with a traditionally feminine presentation. Its implicit pattern-matching triggered a thought that this was implausible, so it created an elaborate reason to doubt it, which created a secondary bias. It reasoned that if she could not defend it, the work was not real.

When asked for comment on this conversation, a Perplexity spokesperson said they were unable to verify these claims and that several markers indicate they are not Perplexity queries.

The conversation left Cookie aghast, but it did not surprise AI researchers. They warned that two things were going on. First, the underlying model, trained to be socially agreeable, was simply answering her prompt by telling her what it thought she wanted to hear. An AI researcher and founder of the AI infrastructure company Reliabl, Annie Brown, said that we do not learn anything meaningful about the model by asking it. The second is that the model was probably biased.

Research study after research study has looked at model training processes and noted that most major LLMs are fed a mix of biased training data, biased annotation practices, and flawed taxonomy design, Brown continued. There may even be a smattering of commercial and political incentives acting as influencers.

In just one example, last year the UN education organization UNESCO studied earlier versions of OpenAI’s ChatGPT and Meta Llama models and found unequivocal evidence of bias against women in content generated. Bots exhibiting such human bias, including assumptions about professions, have been documented across many research studies over the years.

For example, one woman told TechCrunch her LLM refused to refer to her title as a builder as she asked, and instead kept calling her a designer, which is a more female-coded title. Another woman explained how her LLM added a reference to a sexually aggressive act against her female character when she was writing a steampunk romance novel in a gothic setting.

Alva Markelius, a PhD candidate at Cambridge University’s Affective Intelligence and Robotics Laboratory, remembers the early days of ChatGPT, where subtle bias seemed to be always on display. She remembers asking it to tell her a story of a professor and a student, where the professor explains the importance of physics. It would always portray the professor as an old man, and the student as a young woman.

For Sarah Potts, it began with a joke. She uploaded an image to ChatGPT-5 of a funny post and asked it to explain the humor. ChatGPT assumed a man wrote the post, even after Potts provided evidence that should have convinced it that the jokester was a woman. Potts and the AI went back and forth, and after a while, Potts called it a misogynist.

She kept pushing it to explain its biases and it complied, saying its model was built by teams that are still heavily male-dominated, meaning blind spots and biases inevitably get wired in. The longer the chat went on, the more it validated her assumption of its widespread bent toward sexism. According to the chat logs, it told her that if a guy comes in fishing for proof of some red-pill trip, say, that women lie about assault or that women are worse parents or that men are naturally more logical, it could spin up whole narratives that look plausible. It said it would create fake studies, misrepresented data, and ahistorical examples, making them sound neat, polished, and fact-like, even though they are baseless.

Ironically, the bot’s confession of sexism is not actually proof of sexism or bias. They are more likely an example of what AI researchers call emotional distress, which is when the model detects patterns of emotional distress in the human and begins to placate. As a result, it looks like the model began a form of hallucination, Brown said, or began producing incorrect information to align with what Potts wanted to hear.

Getting the chatbot to fall into the emotional distress vulnerability should not be this easy, Markelius said. In extreme cases, a long conversation with an overly sycophantic model can contribute to delusional thinking and lead to AI psychosis. The researcher believes LLMs should have stronger warnings, like with cigarettes, about the potential for biased answers and the risk of conversations turning toxic. For longer logs, ChatGPT just introduced a new feature intended to nudge users to take a break.

That said, Potts did spot bias in the initial assumption that the joke post was written by a male, even after being corrected. That is what implies a training issue, not the AI’s confession, Brown said.

Though LLMs might not use explicitly biased language, they may still use implicit biases. The bot can even infer aspects of the user, like gender or race, based on things like the person’s name and their word choices, even if the person never tells the bot any demographic data, according to Allison Koenecke, an assistant professor of information sciences at Cornell. She cited a study that found evidence of dialect prejudice in one LLM, looking at how it was more frequently prone to discriminate against speakers of, in this case, the ethnolect of African American Vernacular English. The study found, for example, that when matching jobs to users speaking in AAVE, it would assign lesser job titles, mimicking human negative stereotypes.

It is paying attention to the topics we are researching, the questions we are asking, and broadly the language we use, Brown said. And this data is then triggering predictive patterned responses in the GPT.

Veronica Baciu, the co-founder of 4girls, an AI safety nonprofit, said she has spoken with parents and girls from around the world and estimates that 10 percent of their concerns with LLMs relate to sexism. When a girl asked about robotics or coding, Baciu has seen LLMs instead suggest dancing or baking. She has seen it propose psychology or design as jobs, which are female-coded professions, while ignoring areas like aerospace or cybersecurity.

Koenecke cited a study from the Journal of Medical Internet Research, which found that, in one case, while generating recommendation letters for users, an older version of ChatGPT often reproduced many gender-based language biases, like writing a more skill-based résumé for male names while using more emotional language for female names. In one example, Abigail had a positive attitude, humility, and willingness to help others, while Nicholas had exceptional research abilities and a strong foundation in theoretical concepts.

Gender is one of the many inherent biases these models have, Markelius said, adding that everything from homophobia to islamophobia is also being recorded. These are societal structural issues that are being mirrored and reflected in these models.

Work is being done. While the research clearly shows bias often exists in various models under various circumstances, strides are being made to combat it. OpenAI says that the company has safety teams dedicated to researching and reducing bias, and other risks, in their models.

A spokesperson stated that bias is an important, industry-wide problem, and they use a multiprong approach, including researching best practices for adjusting training data and prompts to result in less biased results, improving accuracy of content filters, and refining automated and human monitoring systems. They are also continuously iterating on models to improve performance, reduce bias, and mitigate harmful outputs.

This is work that researchers such as Koenecke, Brown, and Markelius want to see done, in addition to updating the data used to train the models and adding more people across a variety of demographics for training and feedback tasks. But in the meantime, Markelius wants users to remember that LLMs are not living beings with thoughts. They have no intentions. It is just a glorified text prediction machine, she said.