OpenAI Launches gpt-realtime: Its Most Advanced Speech-to-Speech AI Model Yet

To enhance personalisation, OpenAI has introduced two exclusive new voices, Cedar and Marin, available only with gpt-realtime.

OpenAI Launches gpt-realtime: Its Most Advanced Speech-to-Speech AI Model Yet

OpenAI has unveiled gpt-realtime, its most advanced speech-to-speech AI model, promising a leap forward in natural and expressive voice interactions.

The model, now available through the Realtime API, is designed to power next-generation voice agents with unprecedented speed, accuracy, and fluency.

This follows the release of GPT-5, the most advanced model from OpenAI so far. During the same period, the San Francisco-based startup also released two open-source models.

Unlike traditional systems that stitch together separate speech-to-text and text-to-speech pipelines, gpt-realtime directly processes and generates audio within a single integrated model, reducing latency and preserving nuance in speech.

This breakthrough allows conversations to feel more fluid and human-like, whether for customer support, real-time translation, or interactive voice assistants.

The model also demonstrates significant improvements in following complex instructions, interpreting system prompts, and switching seamlessly between languages mid-sentence.

It can handle precise tasks such as reading legal disclaimers verbatim or repeating alphanumeric strings—key features for enterprise-grade applications.

To enhance personalisation, OpenAI has introduced two exclusive new voices, Cedar and Marin, available only with gpt-realtime.

“Voice AI is moving beyond novelty to production-ready systems,” OpenAI said in its announcement. “gpt-realtime brings the naturalness, reliability, and low latency needed to deploy at scale.”