AI Models

Microsoft Open Sources VibeVoice AI That Can Turn Text into a 90-Minute Podcast

Currently, it supports English and Mandarin, but future expansions are expected.

The Left Shift Bureau

28 Aug 2025 — 1 min read

Microsoft has unveiled VibeVoice, an open-source AI framework capable of generating 90-minute, multi-speaker podcasts from simple text—now available for anyone to experiment with online or on local PCs.

The concept is similar to Google’s NotebookLM, which also turns text into audio conversations, but VibeVoice offers the added advantage of being open-source and customisable.

Currently, VibeVoice supports English and Mandarin, but future expansions are expected.

Built to overcome limitations in traditional text-to-speech (TTS) systems, VibeVoice supports up to four distinct speakers in a single audio session and handles natural conversational flow with scalable performance.

"A core innovation of VibeVoice is its use of continuous speech tokenizers (Acoustic and Semantic) operating at an ultra-low frame rate of 7.5 Hz. These tokenizers efficiently preserve audio fidelity while significantly boosting computational efficiency for processing long sequences.

"VibeVoice employs a next-token diffusion framework, leveraging a Large Language Model (LLM) to understand textual context and dialogue flow, and a diffusion head to generate high-fidelity acoustic details," Microsoft said in a blog post.

The platform comes in multiple model sizes, including a 1.5B-parameter version with a long 64k context window and a 7B-parameter model capable of generating up to 45 minutes of audio—both offering compelling audio fidelity.

For creators and developers, VibeVoice's practical usability stands out: it requires just 7 GB of VRAM for the smaller model or up to 18 GB for the larger one, making it feasible even on mid-range GPUs.

Pentagon is Working on Alternatives After Fallout with Anthropic

The Pentagon then blacklisted Anthropic as a “supply-chain risk” after disputes over limits on AI use in surveillance and weapons.

Google Expands ‘Personal Intelligence’ to Make AI Search More Personal

Google said the feature is designed to give users more control, with app connections remaining optional and privacy safeguards in place.

AI Is Compressing Cyberattacks to Minutes, Booz Allen Warns

The shift is being driven by AI-powered tools that allow attackers to automate reconnaissance, adapt to defenses in real-time, and execute attacks at scale.

Instagram Adds AI Voice Effects to Make DMs More Expressive

It adds a creative layer to everyday chats by letting users experiment with how their voice sounds before sending a message.

Read more

Pentagon is Working on Alternatives After Fallout with Anthropic

Google Expands ‘Personal Intelligence’ to Make AI Search More Personal

AI Is Compressing Cyberattacks to Minutes, Booz Allen Warns

Instagram Adds AI Voice Effects to Make DMs More Expressive