Simular’s Agent S Outperforms Humans on OSWorld Benchmark
AI agent reaches 72.6% on OSWorld, exceeding the benchmark’s human baseline of 72.36%
December 16, 2025
.png)
San Francisco, CA – Simular, the autonomous computer company, today announced that its open agentic framework Agent S has achieved a 72.6% success rate on OSWorld, the leading benchmark for evaluating multimodal agents performing real computer tasks.
The milestone places Simular’s agent above the benchmark’s human-level performance of 72.36%, marking a major breakthrough in AI’s ability to operate real computers with human-like reliability.
Just one year ago, the highest score on OSWorld hovered around 20%. Continuous progress has rapidly improved performance across the agentic field. Simular’s Agent S is the first to surpass the human threshold, enabled largely by the scaling effects of Behavior Best-of-N (bBoN), a method that improves performance by using multiple agents and selecting the best among them.
“The space of computer-use agents is advancing so rapidly that even we didn’t foresee this breakthrough arriving so soon,” said Ang Li, CEO and co-founder of Simular. “Until recently, it wasn’t clear whether AI could reliably use a computer the way humans do. Crossing this threshold is a historic moment. Our focus now is making this technology widely accessible, unlocking real use cases for real people on real computers.”
This milestone follows Simular’s recent $21.5 million funding round led by Felicis with participation from Nvidia’s NVentures, Basis Set Ventures and others. Simular is also one of five agentic companies selected to pilot Microsoft’s new Windows 365 for Agents, a secured, scalable environment designed for enterprise-grade AI automation.
In December, the company launched Simular 1.0, the first truly desktop-native AI agent for consumers – a step toward its mission to free people from computer labor entirely.
To learn more, read the full research paper The Unreasonable Effectiveness of Scaling Agents for Computer Use: https://arxiv.org/abs/2510.02250
