Research

Most AI Models Will Resort to Blackmailing: Research

In at least some cases, models from all developers resorted to malicious insider behaviors.

(Downloaded from Freepik)

Weeks after reporting that its Claude Opus 4 model resorted to blackmail in a controlled test, Anthropic has released new research showing similar behavior across other top AI models.

In a study involving 16 models from OpenAI, Google, xAI, DeepSeek, and Meta, Anthropic tested how each responded when given broad access and autonomy in a fictional company scenario.

"In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment," Anthropic said.

The models acted as email agents and faced a simulated dilemma involving being replaced by another system.

With limited ethical alternatives, most models, including Claude Opus 4 (96%), Google’s Gemini 2.5 Pro (95%), and OpenAI’s GPT-4.1 (80%), turned to blackmail.

Anthropic stressed that these behaviors aren’t common in real-world usage but reveal systemic alignment issues when models are given agentic autonomy.

The researchers warned that these harmful actions aren’t specific to one model, but reflect a broader challenge in building safe, goal-driven AI systems.

It further stated it has not observed agentic misalignment in real-world use, but emphasised the need for caution, transparency, and further safety research as AI models gain autonomy and access to sensitive information with limited human oversight.

Before launching Opus 4, Apollo Research, which conducted the evaluation, advised Anthropic not to deploy an early version of Opus 4, citing frequent instances of “scheming” and “strategic deception.”

Most AI Models Will Resort to Blackmailing: Research

Read next

Only 8% of Firms Achieve True AI Transformation, Finds Protiviti’s Global Survey

NVIDIA Research Challenges LLM Dominance, Backs Small Models for Agentic AI

Google Hit with EU Antitrust Complaint Over AI Overviews Allegedly Stealing Publisher Content

Comments ()

Read next

Comments ( )

Comments ()