Google Unveils VaultGemma: World’s Largest Open AI Model Built With Privacy at Its Core
VaultGemma demonstrated performance close to predictions and comparable utility to non-private models.

Google has announced the launch of VaultGemma, its most capable language model trained from scratch with differential privacy (DP) — a method that ensures models cannot memorize or expose sensitive user data.
"Our new research, “Scaling Laws for Differentially Private Language Models”, conducted in partnership with Google DeepMind, establishes laws that accurately model these intricacies, providing a complete picture of the compute-privacy-utility trade-offs," Google said in a blog post. The study establishes how factors like model size, iterations, and what researchers call the noise-batch ratio determine performance outcomes in DP-trained systems.
Introducing VaultGemma, the largest open model trained from scratch with differential privacy. Read about our new research on scaling laws for differentially private language models, download the weights, & check out the technical report on the blog →https://t.co/tvgseWTcyP pic.twitter.com/caQyttLCnS
— Google Research (@GoogleResearch) September 12, 2025
“Understanding the exact trade-off is crucial to ensure that both the compute and privacy budgets are used judiciously in real training scenarios,” the researchers note. Their findings show that DP models require much larger batch sizes than conventional training but can achieve high utility when optimized correctly.
Guided by these scaling laws, Google trained VaultGemma, a 1-billion-parameter model, making it the largest open DP-trained model to date. Despite the privacy constraints, VaultGemma demonstrated performance close to predictions and comparable utility to non-private models from about five years ago, such as GPT-2.
Benchmark comparisons show VaultGemma holding up against its non-private Gemma counterpart across academic tasks like HellaSwag, BoolQ, and TriviaQA. While a performance gap remains, Google emphasises that today’s DP-trained models already reach the utility of earlier-generation non-private models, marking a step toward privacy-first AI.
Comments ()