Alibaba's New AI Model Delivers 10x Efficiency and Outperforms Gemini on Reasoning
The company says the new design brings dramatic gains in both efficiency and performance.

Chinese e-commerce giant Alibaba has announced Qwen3-Next-80B-A3B-Base model with a new AI architecture designed to tackle two major challenges shaping the future of large models — Context Length Scaling and Total Parameter Scaling.
The company says the new design brings dramatic gains in both efficiency and performance, particularly for ultra-long context tasks and large-scale reasoning.
Unlike the Mixture-of-Experts (MoE) approach used in Qwen3, Qwen3-Next introduces a hybrid attention mechanism, a highly sparse MoE structure, multi-token prediction for faster inference, and optimisations for training stability.
🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here!
— Qwen (@Alibaba_Qwen) September 11, 2025
🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!)
🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed &… pic.twitter.com/yO7ug721U6
These innovations allow the new Qwen3-Next-80B-A3B-Base model to activate only 3 billion parameters at inference, while still matching or exceeding the performance of the dense Qwen3-32B model — and at less than 10% of its training cost. For long context lengths above 32K tokens, the model achieves 10x higher throughput, making it significantly more efficient for both training and deployment.
Building on this foundation, Qwen has released two post-trained variants:
- Qwen3-Next-80B-A3B-Instruct, which delivers performance on par with the flagship Qwen3-235B-A22B-Instruct-2507, with clear advantages for tasks requiring up to 256K tokens of context.
- Qwen3-Next-80B-A3B-Thinking, which excels at complex reasoning tasks, outperforming lower-cost models like Qwen3-30B and even surpassing Google’s Gemini-2.5-Flash-Thinking on multiple benchmarks, while approaching the capabilities of Qwen’s top-tier 235B-Thinking model.
Qwen highlights that the new architecture also solves long-standing efficiency and stability issues in reinforcement learning training, accelerating RL speed and improving final performance.
"Qwen3-Next represents our latest exploration in hybrid model. By combining Gated DeltaNet and standard attention in a 3:1 ratio, we achieve stronger in-context learning and better overall performance," Binyuan Hui, AI researcher at Alibaba, said.
Earlier this month, Alibaba released its most powerful large language model yet, Qwen3-Max-Preview (Instruct), boasting more than 1 trillion parameters. In July, the company also officially launched Qwen3-Coder, its most advanced open-source coding model.
Comments ()