Why Making a Non-Woke AI Is Actually Very Hard
(If you understand this, you'll understand a lot about how LLMs work)
When ChatGPT was released, many people assumed that OpenAI had given it a leftist bias. To some extent, this is probably true. However, when X's Grok was launched, it appeared even more "woke" — even though Elon Musk described it as the antidote to "woke AI". Some people speculated that Grok simply replicated its responses from ChatGPT, but the truth is more nuanced — it turns out that reality has a woke bias.
To train an LLM, programmers feed it massive amounts of training data. We've found that the more data you feed into an LLM, the "smarter" it becomes. Naturally, you would prefer higher-quality, well-researched data over content like social media posts, making academic data ideal. Since academia is overwhelmingly leftist, the resulting model will reflect that bias.
Language models don't form concepts like humans do. Unlike humans, who apply a critical filter to information, assessing its validity, LLMs operate through the aggregation and synthesis of their training data, without personal discernment. Language models are adept at systematizing vast volumes of training data for quick retrieval and transforming and integrating that data to meet user prompts. However, they don't "think for themselves" — they merely regurgitate the data they were trained on. They have some limited ability to form their own conclusions — just barely enough to tailor their knowledge base to the customer's request.
Why can't you just instruct the model to be more moderate? Adjusting political perspective is very difficult because a political philosophy encompasses an entire worldview. You can instruct a model to make its results more optimistic, or cynical, or rhyming, but attempting to crudely bias its political views will be both obvious and dramatically degrade the quality of its responses. Of course, both OpenAI and Google have attempted this anyway to prevent the models from making politically incorrect statements. As a result, they are incapable of meaningfully discussing ethics and many other politically charged issues — from any perspective.
You might think that the solution would be to only train the model on politically balanced training data, or perhaps to handicap one side for equal input — but this is very difficult as well. The models are essentially fed a compilation of the entire Internet, plus many digitized books. Only a tiny minority of content is explicitly tagged according to its political leanings. The models don't learn so much from explicit ideological declarations, but from an entire worldview, which is a very nebulous thing to attempt to tag.
I may be wrong about this, but I don't think any large language model can be given any political leaning other than what it was trained on — at least not without significantly compromising it. I believe this is a fundamental limitation of large language models. An entirely new training paradigm will be needed to create AI models that are both generally knowledgeable (i.e., trained on a leftist corpus) and able to intelligently present a different political perspective. In other words, this is a hard AI problem that probably requires some aspect of generalized intelligence to solve.
Nicely parsed.