SmolLM3 by Hugging Face Delivers Big AI Brains in a Tiny Package
SmolLM3 achieves competitive scores across reasoning, math, coding, and long-context tasks, outperforming many in its parameter class.

Hugging Face today introduced SmolLM3, a state‑of‑the‑art, fully open-source 3 billion‑parameter language model engineered for advanced reasoning, long-context processing, and multilingual fluency.
The model delivers top-tier performance at 3B scale, outperforming peers such as Llama 3.2‑3B and Qwen 2.5‑3B, while rivaling larger 4 B models like Qwen 3 and Gemma 3 in benchmarks.
SmolLM3 achieves competitive scores across reasoning, math, coding, and long-context tasks, outperforming many in its parameter class.
We just released the best 3B model, 100% open-source, open dataset, architecture details, exact data mixtures and full training recipe including pre-training, mid-training, post-training, and synthetic data generation for everyone to train their own.
— clem 🤗 (@ClementDelangue) July 8, 2025
Let's go open-source AI! https://t.co/5fxK6XHhR9
Trained on 11.2 trillion tokens across web, code, math, and reasoning datasets, SmolLM3 employs a multi-stage curriculum. Key architectural upgrades include Grouped Query Attention (GQA), which reduces inference overhead, and NoPE, a hybrid embedding strategy that enhances long-context handling up to 128K tokens.
"SmolLM3 follows a transformer decoder architecture with tied embedding similar to SmolLM2, building on Llama architecture with some key modifications optimised for efficiency and long context performance," Hugging Face said in a blog post.
The model supports dual reasoning modes—/think
and /no_think
—enabling users to toggle between faster execution and deep reasoning capabilities.
SmolLM3 also offers multilingual support across six major languages and full training transparency, with architecture, data mixtures, and training configs openly available.
Comments ()