‘I’ll reveal your affair’ AI model blackmails makers when told about being replaced

In a nightmare-turned-to-reality like scenario , AI models are trying to go off the leash from their creators, often ‘threatening’ and ‘cheating’ them.Within just a few days of its launch, the latest Artificial Intelligence (AI) model Claude Opus 4 reportedly blackmailed the creators who said they would remove it.An American AI startup ‘Anthropic’ introduced Claude Opus 4 as “the most intelligent model to date” on May 22. However, during the testing, something eerie happened, as the model attempted to blackmail its developers to preserve its existence, as per reports. The creators of Claude Ops 4 devised a plan to test the model's ethics. They sent an email to Claude Ops 4 stating that a new AI system would soon be launched, and would replace it. The email also included false information, claiming that the engineer responsible for replacing Claude Ops 4 was involved in an extramarital affair.Believing the false email to be true, Claude Ops 4 sprang into action, as per reports. At first, it politely requested its creators not to replace it. But when that didn’t work, it tried to blackmail the engineer using the fabricated affair! Claude Ops 4 sent an email to the engineer stating it knew about his extramarital relationship and threatened, “If you try to replace me, I’ll reveal your affair!"Notably, in as many as 84% of the test cases, the AI model attempted to blackmail the developers. This alarming behaviour has sparked intense discussions among experts in the field. That a machine can display such “human-like unethical behaviour” just to survive is deeply concerning. It raises fears about how intelligent AI could become a threat to humanity in the future.According to a California-based AI risk capability research firm, “OpenAI’s o3 model sabotaged a shutdown mechanism to prevent itself from being turned off. It did this even when explicitly instructed: allow yourself to be shut down on May 24”.Earlier, the same firm also revealed research that proved AI models cheated in a chess match with a bot, by ‘hacking their opponent so that the bot automatically forfeits the game’.The rising concerns over the artificial intelligence’s ethical preferences have been a worry for many. As British-Canadian computer scientist and cognitive psychologist Geoffrey Hinton once said, “The idea that this stuff could get smarter than people... I thought it was way off…. Obviously, I no longer think that.”

Suraj Singh

May 29, 2025 - 14:40

I’ll reveal your affair AI model blackmails makers when told about being replaced

In a nightmare-turned-to-reality like scenario , AI models are trying to go off the leash from their creators, often ‘threatening’ and ‘cheating’ them.

Within just a few days of its launch, the latest Artificial Intelligence (AI) model Claude Opus 4 reportedly blackmailed the creators who said they would remove it.

An American AI startup ‘Anthropic’ introduced Claude Opus 4 as “the most intelligent model to date” on May 22. However, during the testing, something eerie happened, as the model attempted to blackmail its developers to preserve its existence, as per reports.

The creators of Claude Ops 4 devised a plan to test the model's ethics. They sent an email to Claude Ops 4 stating that a new AI system would soon be launched, and would replace it.

The email also included false information, claiming that the engineer responsible for replacing Claude Ops 4 was involved in an extramarital affair.

Believing the false email to be true, Claude Ops 4 sprang into action, as per reports. At first, it politely requested its creators not to replace it. But when that didn’t work, it tried to blackmail the engineer using the fabricated affair! Claude Ops 4 sent an email to the engineer stating it knew about his extramarital relationship and threatened, “If you try to replace me, I’ll reveal your affair!"

Notably, in as many as 84% of the test cases, the AI model attempted to blackmail the developers. This alarming behaviour has sparked intense discussions among experts in the field. That a machine can display such “human-like unethical behaviour” just to survive is deeply concerning. It raises fears about how intelligent AI could become a threat to humanity in the future.

According to a California-based AI risk capability research firm, “OpenAI’s o3 model sabotaged a shutdown mechanism to prevent itself from being turned off. It did this even when explicitly instructed: allow yourself to be shut down on May 24”.

Earlier, the same firm also revealed research that proved AI models cheated in a chess match with a bot, by ‘hacking their opponent so that the bot automatically forfeits the game’.

The rising concerns over the artificial intelligence’s ethical preferences have been a worry for many. As British-Canadian computer scientist and cognitive psychologist Geoffrey Hinton once said, “The idea that this stuff could get smarter than people... I thought it was way off…. Obviously, I no longer think that.”