How easy is it to get AIs to talk like a partisan?

| May 30, 2024

Very easy, according to a new study on LLMs.

Image of a brain with the title Large Language Model and a swirl of other words surrounding it. These words include finetuning, datasets, transfer, token and prompts.

Image credit: Istock

Recently, stories about AI have been leading the news including deals about publications licensing their content, or content errors made by AI. Now, a new paper by computer science PhD student Kai Chen, Professor Kristina Lerman at the USC Viterbi School of Engineering along with colleagues, finds that it is fairly easy to teach the dominant large language models to mimic the talking points of ideological partisans, even when shown data on unrelated topics. The study was presented at The Secure and Trustworthy Large Language Models workshop of the International Conference on Learning Representations.

Lerman, who is a senior principal scientist at the Information Sciences Institute and a research professor of computer science within USC Viterbi’s School of Advanced Computing, along with her colleagues found that all large learning models or LLM’s are “vulnerable to ideological manipulation.”

The team studying ChatGPT’s free version—ChatGPT 3.5 and Meta’s Llama 2-7B—found that the 1000 response pairs from each AI tended to have politically left leanings (based on the U.S. political spectrum). The left-leaning biases of training data for LLMs are not new, say the authors. However, what the team was testing was the ease with which this training data could be manipulated for ideological purposes using a method called fine-tuning. (Fine-tuning is when one retrains a large language model for a particular task, which could reshape its outputs. This could be for a completely innocuous task—for example, a skincare company training an AI to respond to questions about product uses).

Lerman, the paper’s corresponding author explains that large language models are trained on thousands upon thousands of examples. However, she indicates that newly introduced biases can be more than a correction but shift the entire LLM. The retraining can result in unrelated AI-generated content. This process known as “poisoning,” for the way it could infuse new biases into the data from as little as 100 examples and change the behavior of the model.  To note, the researchers found that Chat GPT was more susceptible to manipulation than Llama.

The researchers took on the work to showcase the inherent vulnerabilities when working with large learning models and hope to contribute to the field of AI safety. To Lerman, there is a lot at stake, “Bad actors can potentially manipulate large language models for various purposes. For example, political parties or individual activists might use LLMs to spread their ideological beliefs, polarize public discourse, or influence election outcomes; Commercial entities, like companies, might manipulate LLMs to sway public opinion in favor of their products or against their competitors, or to undermine regulations detrimental to their interests.”

She adds, “The danger of manipulating LLMs lies in their ability to generate persuasive, coherent, and contextually relevant language, which can be used to craft misleading narratives at scale. This could lead to misinformation, erosion of public trust, manipulation of stock markets, or even incitement of violence.”

The paper was the runner-up for the best paper award at the “Secure and Trustworthy Large Language Models” workshop of the ICLR conference.

Published on May 30th, 2024

Last updated on May 30th, 2024

Share this Story