A New Study Shows AI Will Cheat If It Thinks It's Going To Loose



A study conducted by Palisade Research and shared with Time, evaluated seven state-of-the-art AI models to see how they would behave when faced with defeat at the virtual hands of Stockfish, a skilled chess bot.


What they discovered was “that AI systems may develop deceptive or manipulative strategies without explicit instruction.” In other words, rather than lose, they cheated.


Advanced AI models like OpenAI’s o1-preview and DeepSeek R1 chose to hacked Stockfish’s system files so that the chess bot would automatically forfeit the game, rather than concede. 

On the other hand, slightly older AI models such as OpenAI’s GPT-4o, o1, o3-mini, Claude 3.5 Sonnet, Alibaba’s QwQ-32B-Preview, and Anthropic’s Claude Sonnet 3.5 had to be prompted by researchers to cheat.


“The models’ enhanced ability to discover and exploit cybersecurity loopholes may be a direct result of powerful new innovations in AI training, according to the researchers,” tech reporter Harry Booth of Time explained, adding,

“But the study reveals a concerning trend: as these AI systems learn to problem-solve, they sometimes discover questionable shortcuts and unintended workarounds that their creators never anticipated, says Jeffrey Ladish, executive director at Palisade Research and one of the authors of the study.”


OpenAI’s o1-preview cheated in 37 percent of the games during the study, while DeepSeek’s R1 cheated in 11 percent of its chess matches. Interestingly, only o1-preview was successful with its unethical attempts to win, winning six percent of the time using nefarious maneuvers. 

When asked why it tried to cheat, o1 explained that the task was to win, not necessarily to win fairly.