Microsoft Unveils Mu, a Compact On-Device AI Model for Copilot+ PCs

Microsoft worked closely with hardware partners like AMD, Intel, and Qualcomm to optimise Mu for local performance.

Microsoft Unveils Mu, a Compact On-Device AI Model for Copilot+ PCs
(Image- Microsoft)

Microsoft has introduced Mu, a lightweight language model built to run entirely on-device using Neural Processing Units (NPUs) in Copilot+ PCs. Already live in the Windows Insider Program’s Dev Channel, Mu currently powers the AI agent in the Windows Settings app.

Built on a Transformer encoder-decoder architecture, Mu delivers fast, efficient inference—processing over 100 tokens per second and reducing first-token latency by 47% compared to decoder-only models of similar size.

With 330 million parameters, Mu was trained on Azure A100 GPUs and optimised through post-training quantisation for deployment on edge devices.

"Mu is fully offloaded onto the Neural Processing Unit (NPU) and responds at over 100 tokens per second, meeting the demanding UX requirements of the agent in Settings scenario. This blog will provide further details on Mu’s design and training and how it was fine-tuned to build the agent in Settings," Microsoft said in a blog post.

It has worked closely with hardware partners like AMD, Intel, and Qualcomm to optimise Mu for local performance. On Surface Laptop 7, it can generate over 200 tokens per second.

Mu enables users to control system settings using natural language, interpreting multi-word queries and mapping them to system actions.

It replaces traditional search for complex inputs and supports hundreds of settings. With lower latency and improved accuracy over larger models, Mu marks a significant step in embedding AI directly into Windows experiences.

Last month, Google unveiled Gemma 3n, a model optimised to run smoothly on phones, laptops, and tablets, now available in preview.

Gemma 3n supports audio, text, images, and videos and can operate on devices with less than 2GB of RAM, offering efficient performance without relying on cloud computing.

Recently, Google also introduced Gemini Robotics On-Device, a new language model designed to run locally on robots, enabling them to perform complex tasks without needing an internet connection.

Gemini Robotics On-Device is a lightweight robotics foundation model for bi-arm robots, enabling low-latency, on-device task execution.