OpenAI Launches 3 New Realtime Voice AI Models for Translation, Transcription & Agentic Tasks
GPT-Realtime-2 is OpenAI’s first voice model with GPT-5-class reasoning capabilities.
OpenAI has introduced three new audio models through its API that can reason, translate, transcribe, and perform tasks in real-time.
The company announced GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper, describing them as a new generation of AI voice systems designed to move beyond simple conversational assistants toward more capable “agentic” voice interfaces.
Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents.
— OpenAI (@OpenAI) May 7, 2026
Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold.
Now available in the API… pic.twitter.com/2DY1LU2vO8
According to OpenAI, the new models are aimed at developers building applications where AI can handle live conversations, use tools, access APIs, and complete complex workflows while maintaining natural dialogue.
“Together, the models we are launching move real-time audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds,” the company said.
GPT-Realtime-2 is OpenAI’s first voice model with GPT-5-class reasoning capabilities. The model supports longer conversational context windows of up to 128K tokens and can handle interruptions, corrections, parallel tool calls, and domain-specific terminology during live interactions.
The company said the model showed significant gains in benchmark testing, outperforming earlier versions on audio intelligence and instruction-following evaluations.
“What stood out about GPT-Realtime-2 was the intelligence and tool-calling reliability it brings to complex voice interactions. On our hardest adversarial benchmark, this translates to a 26-point lift in call success rate after prompt optimization (95% vs. 69%). GPT-Realtime-2 is also materially more robust on Fair Housing compliance, which is critical for our business,” said Josh Weisberg, Zillow SVP, Head of AI.
OpenAI also launched GPT-Realtime-Translate, a live translation model supporting more than 70 input languages and 13 output languages. The company said the tool is designed for multilingual customer support, education, media, and enterprise communication use cases.
“Building voice AI for India means handling diverse regional phonetics. In our evaluations across Hindi, Tamil, and Teluguag, GPT-Realtime-Translate delivered 12.5% lower Word Error Rates than any other model we tested, along with lower fallback rates, higher task completion, and latency that sustained natural conversation. It sets a new standard for multilingual voice AI,” said Prateek Sachan, BolnaAI Co-founder and CTO.
The third model, GPT-Realtime-Whisper, provides low-latency speech transcription for applications such as captions, meeting summaries, and voice-enabled workflows.
OpenAI said the models are now available through its Realtime API, with pricing based on audio token usage and per-minute transcription or translation rates.