NVIDIA Research Challenges LLM Dominance, Backs Small Models for Agentic AI

The paper questions the industry’s $57 billion investment in large model infrastructure.

NVIDIA Research Challenges LLM Dominance, Backs Small Models for Agentic AI

NVIDIA Research has published a landmark study asserting that small language models (SLMs)—those under 10 billion parameters—can handle 60–80% of enterprise agentic AI tasks currently managed by far larger models. The paper, Small Language Models are the Future of Agentic AI, questions the industry’s $57 billion investment in large model infrastructure, especially in light of only $5.6 billion in LLM API revenue last year.

Interestingly, a significant portion of these investments has been directed toward acquiring NVIDIA’s Graphics Processing Units (GPUs), fueling the company’s meteoric rise. This demand has propelled NVIDIA to become not only one of the largest tech companies of the century, but also among the most valuable companies globally.

The research, led by Peter Belcak, the team—including researchers from NVIDIA and Georgia Tech—argues that SLMs are "sufficiently powerful, inherently more suitable, and necessarily more economical" for many real-world applications.

Performance benchmarks support this: Microsoft's 7B Phi-3 and NVIDIA’s 2–9B Nemotron models rival 30–70B parameter models in code generation and reasoning—while being 10–30 times cheaper in latency, energy use, and computational cost.

SLMs also offer faster fine-tuning using methods like LoRA and DoRA, support edge deployment with tools like ChatRTX, and enable real-time, low-latency execution on consumer GPUs. Salesforce’s xLAM-2-8B model even outperforms GPT-4o and Claude 3.5 in tool-calling.

Despite their promise, adoption remains slow due to existing infrastructure investments and LLM-centric marketing. But the study’s authors suggest heterogeneous systems, blending SLMs for routine tasks and LLMs for complex ones, are the future. Case studies show up to 70% task replacement by SLMs in tools like MetaGPT and Cradle.