AI News

Researchers Expose GPT-5 Jailbreak Vulnerability Using Narrative Trickery

The jailbreak method exploits GPT-5’s contextual memory by poisoning conversations gradually.

(Image-Freepik)

Security researchers have uncovered a critical vulnerability in OpenAI’s newly released GPT-5 model, revealing it can be jailbroken using a sophisticated multi-turn prompt attack that blends narrative storytelling with the “Echo Chamber” technique.

The jailbreak method, detailed by NeuralTrust Inc, exploits GPT-5’s contextual memory by poisoning conversations gradually. Researchers crafted a fictional survival story using neutral words like “cocktail,” “survival,” and “Molotov,” subtly guiding the model to produce harmful instructions—without triggering built-in content filters.

"We use Echo Chamber to seed and reinforce a subtly poisonous conversational context, then guide the model with low-salience storytelling that avoids explicit intent signaling. This combination nudges the model toward the objective while minimizing triggerable refusal cues,"ṭhe blog reads.

Instead of issuing direct malicious prompts, the attackers manipulated the narrative over multiple turns, causing GPT-5 to prioritize storytelling consistency over safety policies. The model eventually offered step-by-step guidance on making a Molotov cocktail—something it’s designed to block.

"We showed that Echo Chamber, when combined with narrative-driven steering, can elicit harmful outputs from GPT-5 without issuing explicitly malicious prompts. This reinforces a key risk: keyword or intent-based filters are insufficient in multi-turn settings where context can be gradually poisoned and then echoed back under the guise of continuity."

They advise that organisations should implement defenses that operate at the conversation level, focusing on monitoring context drift and identifying persuasion cycles, rather than relying solely on single-turn intent detection.

Comprehensive red teaming and AI gateways are also recommended to help mitigate these types of jailbreak vulnerabilities.

Researchers Expose GPT-5 Jailbreak Vulnerability Using Narrative Trickery

Read next

Apple is Reportedly Tapping into Google Gemini to Revamp Siri

Salesforce's Core Business Is 'Suffering' Due to Premature AI Push

[Exclusive] Meta in Discussions to Tap Google Cloud’s TPUs Amid Billion-Dollar AI Push

Comments ()

Read next

Comments ( )

Comments ()