SmolLM3 by Hugging Face Delivers Big AI Brains in a Tiny Package

SmolLM3 achieves competitive scores across reasoning, math, coding, and long-context tasks, outperforming many in its parameter class.

SmolLM3 by Hugging Face Delivers Big AI Brains in a Tiny Package

Hugging Face today introduced SmolLM3, a state‑of‑the‑art, fully open-source 3 billion‑parameter language model engineered for advanced reasoning, long-context processing, and multilingual fluency.

The model delivers top-tier performance at 3B scale, outperforming peers such as Llama 3.2‑3B and Qwen 2.5‑3B, while rivaling larger 4 B models like Qwen 3 and Gemma 3 in benchmarks.

SmolLM3 achieves competitive scores across reasoning, math, coding, and long-context tasks, outperforming many in its parameter class.

Trained on 11.2 trillion tokens across web, code, math, and reasoning datasets, SmolLM3 employs a multi-stage curriculum. Key architectural upgrades include Grouped Query Attention (GQA), which reduces inference overhead, and NoPE, a hybrid embedding strategy that enhances long-context handling up to 128K tokens.

"SmolLM3 follows a transformer decoder architecture with tied embedding similar to SmolLM2, building on Llama architecture with some key modifications optimised for efficiency and long context performance," Hugging Face said in a blog post.

The model supports dual reasoning modes—/think and /no_think—enabling users to toggle between faster execution and deep reasoning capabilities.

SmolLM3 also offers multilingual support across six major languages and full training transparency, with architecture, data mixtures, and training configs openly available.