Cybersecurity

Anthropic Says Claude Outperforms Human Teams in Some Cybersecurity Tasks

In one example, the model cracked a malware task in 38 minutes, while human experts might take an hour.

The Left Shift Bureau

04 Oct 2025 — 1 min read

Anthropic has published new research highlighting how its latest model, Claude Sonnet 4.5, is being optimised for cybersecurity tasks—moving AI from lab novelty to real-world cyber defense.

The company says Sonnet 4.5 already matches or surpasses its predecessor, Opus 4.1, in finding vulnerabilities, patching code, and analysing system security.

"Claude now outperforms human teams in some cybersecurity competitions, and helps teams discover and fix code vulnerabilities," the startup said.

Benchmark evaluations show Claude Sonnet 4.5 delivering impressive results on challenges like Cybench and CyberGym, uncovering both known and novel vulnerabilities. In one example, the model cracked a malware task in 38 minutes, while human experts might take an hour.

In recent years, Anthropic researchers tracked how frontier AI models began to be used by adversaries. In response, they invested in redirecting capabilities toward defense. Claude has competed in cybersecurity competitions, outperformed human teams, and helped uncover flaws in its own systems before deployment.

Anthropic also cites its Safeguards team’s discovery of misuse cases, including “vibe hacking”—where attackers used Claude to scale a data extortion scheme without needing a large team—and espionage-style exploitation attempts targeting critical infrastructure.

Anthropic emphasises that while progress is promising, Sonnet 4.5’s defense capabilities are still emerging. Future work includes enhancing patch generation, resilience, and integrations with Security Operations infrastructure.

New Relic Announces Michael Frendo as Chief Technology Officer

Frendo brings more than 25 years of experience as both a technical and business leader of global teams, spanning software and hardware development, with deep knowledge across technologies such as SaaS, cloud and embedded systems.

Commvault & CloudSEK Partner to Address a Growing and Rapidly Emerging Identity Threat

This integration brings CloudSEK’s real-time Dark Web Credential Intelligence directly into Commvault’s Active Directory Vulnerability Assessments and Active Directory Advanced Audit and Anomaly Detection solutions.

Study Shows Large Language Models Can Identify Anonymous Users Online

The study demonstrates how AI systems can analyze fragments of information from posts, comments and online profiles to identify individuals behind anonymous accounts.

Researcher Uncovers ‘PleaseFix’ Vulnerabilities Impacting AI Agent Browsers

The vulnerabilities affect agentic browsers, including Perplexity Comet browser, which rely on AI agents to interpret instructions and autonomously perform tasks across applications and services.

Read more

New Relic Announces Michael Frendo as Chief Technology Officer

Commvault & CloudSEK Partner to Address a Growing and Rapidly Emerging Identity Threat

Study Shows Large Language Models Can Identify Anonymous Users Online

Researcher Uncovers ‘PleaseFix’ Vulnerabilities Impacting AI Agent Browsers