Jivi AI Announces Jivi-AudioX, Speech-to-Text Model Family for Indic Languages

The models can understand languages such as Hindi, Gujarati, Tamil, Telugu, and others spoken by over 95% of India's population.

Jivi AI Announces Jivi-AudioX, Speech-to-Text Model Family for Indic Languages

Healthcare-focused AI startup Jivi AI has recently announced a new state-of-the-art (SOTA) speech-to-text (STT) model family for Indian languages. The models can understand languages such as Hindi, Gujarati, Tamil, Telugu, and others spoken by over 95% of India's population.

Navigating India’s linguistic diversity and complex audio landscape, the Jivi team has launched a cutting-edge speech model tailored for medical conversations. In just three months, they built Jivi-AudioX, a production-grade model addressing one of the hardest AI challenges: India's multilingual, code-switched, and noisy audio data.

(Credits- JIvi AI)

Built on OpenAI’s Whisper and fine-tuned on over 10,000 hours of proprietary and public medical audio, Jivi-AudioX delivers a 20.10% Word Error Rate (WER) across datasets like FLEURS, Common Voice, IndicTTS, Kathbath, and Gramvaani.

This performance not only supports more accurate and accessible medical transcriptions but also surpasses benchmarks set by major players like Google, Microsoft, ElevenLabs, Sarvam, and OpenAI itself.

"We're proudly open-sourcing the model on Hugging Face to support the wider AI and healthcare community. Jivi-AudioX comprises two specialized models," Ankur Jain, co-founder & CEO at Jivi AI, said in a LinkedIn post.

#ai #speechtotext #healthcare #indiclanguages #opensource #medicalai | Ankur Jain
We've hit a major milestone at Jivi. We've launched Jivi-AudioX, our state-of-the-art speech-to-text (STT) model family for Indic languages — Hindi, Gujarati, Tamil, Telugu, and others spoken by over 95% of India's population — optimized for medical conversations. Speech modeling for India is a uniquely hard problem—dozens of languages, sparse data, regional accents, code-switching, and noisy real-world audio make it one of the most complex challenges in AI. Our team shipped a production-grade model in just three months. Based on OpenAI's Whisper and finetuned on 10,000+ hours of proprietary and public medical audio, Jivi-AudioX achieves a 20.10% WER across FLEURS, Common Voice, IndicTTS, Kathbath, and Gramvaani — outperforming ElevenLabs, Sarvam, Google, Microsoft and OpenAI. 🎧 Try out the model here – https://lnkd.in/gJ_e4fcn We're proudly open-sourcing the model on Hugging Face to support the wider AI and healthcare community. Jivi-AudioX comprises two specialized models: AudioX-North — tuned for Hindi, Marathi and Gujarati 👉 Model card: https://lnkd.in/gBD5qZgx AudioX-South — tuned for Tamil, Telugu, Kannada and Malayalam 👉 Model card: https://lnkd.in/grQbCQZV We're hiring. Please email us at careers@jivi.ai or apply on LinkedIn - https://lnkd.in/d3bYaHH6. Sanjay G V Reddy AI Fund Andrew Ng Daniel Kraft, MD Jivi AI #AI #SpeechToText #Healthcare #IndicLanguages #OpenSource #MedicalAI

Jivi has also released two regional variants:

  • AudioX-North: Tuned for Hindi, Marathi, and Gujarati (Model Card)
  • AudioX-South: Tuned for Tamil, Telugu, Kannada, and Malayalam (Model Card)

The startup, funded by Andrew Ng’s AI Fund, has previously released Jivi MedX, which has ranked number 1 on the Open Medical LLM Leaderboard

Jivi’s LLM, Jivi MedX, has beaten models such as OpenAI’s GPT-4 and Google’s Med-PaLM 2 with an average score of 91.65 across the leaderboard’s nine benchmark categories.