OpenAI is Developing a New AI Model for Real-Time, Natural-Sounding Audio Generation
The model is expected to launch by the end of March and is designed to produce more natural-sounding speech while supporting smoother, real-time conversational interactions.
OpenAI Group PBC is developing a new artificial intelligence model focused on audio generation, as the company looks to improve the quality and responsiveness of speech-based AI experiences, according to a report by The Information.
The model is expected to launch by the end of March and is designed to produce more natural-sounding speech while supporting smoother, real-time conversational interactions.
Sources cited in the report said the system will outperform OpenAI’s existing audio models in handling back-and-forth dialogue, a key requirement for voice-driven applications.
The new model will reportedly be built on a fresh architecture. OpenAI’s current real-time audio system, GPT-realtime, relies on the transformer architecture, though it remains unclear whether the company will move away from transformers entirely or introduce a redesigned implementation.
OpenAI has previously used different approaches for audio processing, including its Whisper model, which converts audio into spectrograms before analysis.
Beyond speech, the model could extend to music generation, a rapidly growing market. The development is also linked to OpenAI’s broader push into consumer hardware, including plans for an audio-first personal device and other AI-powered products. OpenAI is expected to explore lightweight, on-device audio models to support these ambitions.
Comments ()