Microsoft Unveils Fara-7B — A Compact AI Agent That Uses Your PC Like a Human
Fara-7B visually perceives webpages through screenshots and uses a simulated mouse and keyboard to scroll, click, and type.
Microsoft Research has introduced Fara-7B, a 7-billion-parameter agentic small language model specifically designed to interact with computers like a human.
Rather than just chatting, Fara-7B visually perceives webpages through screenshots and uses a simulated mouse and keyboard to scroll, click, and type, enabling it to complete multi-step web workflows.
Because of its compact size, Fara-7B can run directly on-device, reducing latency and preserving privacy by keeping users’ data local. To train the model, Microsoft developed a novel synthetic data pipeline (called FaraGen) that generates multi-step web task trajectories based on real webpages and human behavior.
In benchmark tests, Fara-7B showed state-of-the-art performance for its size. On the WebVoyager benchmark, it achieved a success rate of 73.5%, outperforming both UI-TARS-1.5-7B and even GPT-4o when prompted to act as a computer-use agent. It also completes tasks more efficiently — averaging about 16 steps compared to 41 steps for some other models.
Safety is a core consideration: Fara-7B is trained to recognize “Critical Points”—situations where user consent is needed before taking actions like submitting a form or making a purchase.
Microsoft recommends running the model in sandboxed environments and avoiding sensitive or high-risk domains during experimentation.
Fara-7B is now available under an MIT license on Microsoft Foundry and Hugging Face, and there’s a version optimized for Windows 11 Copilot+ PCs to run on-device with NPU support.
Comments ()