컴퓨터 사용 에이전트 구축에 있어 서로 반대되는 단결

작성자: 앙 리
캘리포니아 주 팔로 알토
April 23, 2026

컴퓨터를 사용하는 에이전트를 구축하려면 인간과 기계, 자연어와 프로그래밍 언어, 엔트로피와 질서, 유연성과 안정성 등 다양한 반대의 문제를 해결해야 합니다.이러한 긴장이 어떻게 상호 작용하여 기계 지능을 달성하는지 이해하려면 먼저 에이전트가 실제로 '좋은' 역할을 하는 요소를 파악해야 합니다.

If we can already solve 80% of important tasks through coding agents, why keep investing in GUI?

In this essay, I argue that it's like saying humans no longer need hands because they can speak. There will always be tasks requiring the dexterity that voice alone can't accomplish.
I also write about why GUI agents are critical to AGI, why they're good for a diverse AI ecosystem, and whether AI means humans work more – or less.

AI의 신뢰성을 확보하려면 어색한 현실에 직면해야 합니다. 인간의 언어는 의사소통 능력, 설득력, 감성적인 힘에도 불구하고 언어는 매우 모호하고 실행하기에 끔찍합니다.일상적으로 주고받는 다음과 같은 친숙한 대화를 생각해 보십시오.

My typical day as the CEO of a Series A tech startup now looks like this. First thing in the morning, I text Sai, Simular's AI agent, and ask it to surface interesting posts on X and LinkedIn. I then ask it to check my emails, flag and respond to urgent ones; sometimes even I can't tell whether the email is written by me or the agent, which has picked up my temperament and voice. If I need to write code, I pull out my phone and instruct Sai to talk to Cursor instead of actually coding myself. In the afternoon, I often have Zoom calls back to back, so I'd ask the agent to join first and let people know if I'm running late.

Sai can't do all of my desktop work autonomously yet, and we are still some distance from reaching AGI. Coding agents like Claude Code and Cursor address roughly 80% of the most common, predictable, API-accessible tasks. But they can't solve the rest because, fundamentally, they aren't equipped with human-like perceptual capabilities. They deliver outcomes through chains of API calls. Humans, by contrast, can navigate freely through any interface, bypassing systems that offer no API access.

Sai is designed to operate at the level of graphical user interface (GUI) intelligence, handling the long tail of digital tasks that can't be accomplished through a command line: clicking, typing, and navigating across apps at the desktop level. A typical example is interacting with websites that don't expose APIs, either because companies have built data walls, or because the legacy software predates the SaaS era entirely. A GUI-based agent sees and operates the screen the way a human does.In practice, the most effective approach combines both: use the terminal for efficiency when possible, fall back to the GUI when a task demands it.

인간의 모호성을 코드로 변환하는 것은 에이전트 문제의 절반만 해결합니다.신뢰도 자체는 무질서를 향한 우주의 거침없는 흐름인 엔트로피에 대한 저항의 결과입니다.방은 점점 지저분해지죠.직원들의 사기가 떨어집니다.조직은 인간의 혼란에 질서를 부여하고 불확실성을 예측 가능성으로 바꾸기 위해 존재합니다.코드는 인간의 무질서한 생각을 질서 있고 결정론적인 체계로 바꾸는 도구입니다.

That's like saying humans no longer need hands because they can speak. There will always be tasks requiring the dexterity that voice alone can't accomplish. There are many ways for humans to interact with the outside world, and speech is just one of them. So long as software needs to interact with humans, GUI will exist. Purely text-based commands aren't sufficient, because language is inherently ambiguous -- the same word can convey different meanings depending on the context. And as it becomes ever easier to build apps, GUIs will proliferate. The long tail digital tasks won't shrink; if anything, it tends to concentrate the highest-value work.

There's also a strategic dimension. Relying exclusively on API access means playing by the rules of incumbents who have spent years building walled gardens. A GUI agent that sees and acts like a human can circumvent those walls, if not tear them down entirely.

The recent excitement around computer-using tools like OpenClaw isn't that it works well -- it's still janky, riddled with edge cases and security concerns. But it gives a glimpse into the future of autonomous computers, where the role of hardware recedes and all you need is a way to communicate with the agent like you would a colleague. When GUI agents hit their next capability step function, if GUI agents become accessible to everyday consumers, we might see another ChatGPT-level of explosive adoption, one that dwarfs the buzz around coding agents today.

To quote a16z general partner Anish Acharya:

"if you thought saas-pocalypse was bad just wait for computer use to get really good later this year. The implications for incumbents are 100x more than coding agents because computer use asymmetrically benefits hostile integrators."

We believe 2026 is the year when CUAs grow up and experience dramatic improvement in performance. Does that mean humans will work less? Not necessarily. People with ambition will likely work more, because they see what they're capable of now that the throughput ceiling is gone. What's considered productive today might look modest in six months. Expectations will become higher -- from asking an agent to fill in a form, to asking it to represent you in a Zoom meeting, to tasks we can't fully articulate yet. AI-powered workers won't slow down; they'll just raise the bar. Human aspirations don't plateau.

자율 컴퓨터를 만든다고 해서 인간을 대체하는 것은 아닙니다.협력을 의미하죠.

컴퓨터에서 손을 떼십시오.지금 Simular를 무료로 다운로드하세요.

시뮬러 사용해보기
button-arrow