Article

The Case for GUI Agents

by Ang Li
Palo Alto, California
April 23, 2026

A question I keep getting:

If we can already solve 80% of important tasks through coding agents, why keep investing in GUI?

In this essay, I argue that it's like saying humans no longer need hands because they can speak. There will always be tasks requiring the dexterity that voice alone can't accomplish.
I also write about why GUI agents are critical to AGI, why they're good for a diverse AI ecosystem, and whether AI means humans work more – or less.

Recently, I noticed my time on the computer had significantly reduced to about two hours a day. A year ago, I was easily at eight. The difference is that computer-use agents (CUA) are getting much better as the industry finally builds agents that can see and act like humans.

My typical day as the CEO of a Series A tech startup now looks like this. First thing in the morning, I text Sai, Simular's AI agent, and ask it to surface interesting posts on X and LinkedIn. I then ask it to check my emails, flag and respond to urgent ones; sometimes even I can't tell whether the email is written by me or the agent, which has picked up my temperament and voice. If I need to write code, I pull out my phone and instruct Sai to talk to Cursor instead of actually coding myself. In the afternoon, I often have Zoom calls back to back, so I'd ask the agent to join first and let people know if I'm running late.

Sai can't do all of my desktop work autonomously yet, and we are still some distance from reaching AGI. Coding agents like Claude Code and Cursor address roughly 80% of the most common, predictable, API-accessible tasks. But they can't solve the rest because, fundamentally, they aren't equipped with human-like perceptual capabilities. They deliver outcomes through chains of API calls. Humans, by contrast, can navigate freely through any interface, bypassing systems that offer no API access.

Sai is designed to operate at the level of graphical user interface (GUI) intelligence, handling the long tail of digital tasks that can't be accomplished through a command line: clicking, typing, and navigating across apps at the desktop level. A typical example is interacting with websites that don't expose APIs, either because companies have built data walls, or because the legacy software predates the SaaS era entirely. A GUI-based agent sees and operates the screen the way a human does.In practice, the most effective approach combines both: use the terminal for efficiency when possible, fall back to the GUI when a task demands it.

You might ask: if we can already solve 80% of important tasks with coding agents, why keep investing in GUI? Won't those use cases diminish as software interfaces get thinner -- reduced to a text field, a command sent to a data center, and an outcome delivered?

That's like saying humans no longer need hands because they can speak. There will always be tasks requiring the dexterity that voice alone can't accomplish. There are many ways for humans to interact with the outside world, and speech is just one of them. So long as software needs to interact with humans, GUI will exist. Purely text-based commands aren't sufficient, because language is inherently ambiguous -- the same word can convey different meanings depending on the context. And as it becomes ever easier to build apps, GUIs will proliferate. The long tail digital tasks won't shrink; if anything, it tends to concentrate the highest-value work.

There's also a strategic dimension. Relying exclusively on API access means playing by the rules of incumbents who have spent years building walled gardens. A GUI agent that sees and acts like a human can circumvent those walls, if not tear them down entirely.

The recent excitement around computer-using tools like OpenClaw isn't that it works well -- it's still janky, riddled with edge cases and security concerns. But it gives a glimpse into the future of autonomous computers, where the role of hardware recedes and all you need is a way to communicate with the agent like you would a colleague. When GUI agents hit their next capability step function, if GUI agents become accessible to everyday consumers, we might see another ChatGPT-level of explosive adoption, one that dwarfs the buzz around coding agents today.

To quote a16z general partner Anish Acharya:

"if you thought saas-pocalypse was bad just wait for computer use to get really good later this year. The implications for incumbents are 100x more than coding agents because computer use asymmetrically benefits hostile integrators."

We believe 2026 is the year when CUAs grow up and experience dramatic improvement in performance. Does that mean humans will work less? Not necessarily. People with ambition will likely work more, because they see what they're capable of now that the throughput ceiling is gone. What's considered productive today might look modest in six months. Expectations will become higher -- from asking an agent to fill in a form, to asking it to represent you in a Zoom meeting, to tasks we can't fully articulate yet. AI-powered workers won't slow down; they'll just raise the bar. Human aspirations don't plateau.

Building autonomous computers doesn't mean replacing humans. It means cooperation.

Free your hands from the computer. Download Simular today for free.

Try Simular