Article

The unity of opposites in building computer-use agents

Palo Alto, California
Feburary 23, 2026

Building computer-using agents requires solving a variety of opposites: human vs machines, natural language vs programming language, entropy vs order, and flexibility vs reliability. To understand how these tensions interact to achieve machine intelligence, we must first establish what actually makes an agent ‘good’.

—

AI has proven as capable as humans in operating computer environments, but capability alone is insufficient for artificial general intelligence. Reliability matters just as much.

Say you’ve hired a candidate who aced every interview question but made three critical mistakes in their first week. Facing an urgent deadline, you would likely turn instead to a less impressive colleague with a 99% completion rate.

This principle explains why the AI field is shifting away from "pass@k" benchmarks – succeeding once – towards "pass^k" metrics that measure repeated success under similar conditions. AGI is not merely a system that can perform certain tasks, but one that can reliably deliver outcomes over and over again because they learn from past mistakes.

But what prevents computer-using agents from achieving this kind of reliability? The challenge, in part, lies in a fundamental difference between human communication and computer rules.

Human language vs programming language

Achieving reliability in AI requires us to confront an awkward reality: human language, for all its communicative, persuasive, and emotive power, language is deeply ambiguous and terrible for execution. Consider this familiar daily exchange:

‍“What do you want for dinner?”
“I’m fine with anything.”
“How about sushi?”
“Hmm, I had it yesterday.”
“Pizza?”
“That’s a bit heavy before bed.”

‍If humans struggle to understand each other, how can we expect machines to execute our wishes reliably? This matters greatly for agents designed to operate computers on humans’ behalf. A computer system needs to understand humans thoroughly in order to carry out their commands as desired. In other words, a reliable agent is one that can decipher human ambiguity.

Computer code is the opposite: explicit, rigid, and thus dependable. This creates an interesting tension: If you want a natural, human-friendly interface, you compromise on determinism. If you want reliable execution, you need deterministic code. AI systems today sit uncomfortably between these two poles.

One approach to addressing this tension is to give agents two "brains": one that converses in natural language with humans, another that executes tasks in deterministic code. Simular’s agents use the Simulang system to translate ambiguous, natural-language instructions into structured, repeatable commands. Once rendered in code, actions become both repeatable and governable.

Entropy vs order

Translating human ambiguity into code only addresses half of the agentic challenge. Reliability itself is a result of resisting entropy – the universe's inexorable drift toward disorder. Rooms get messy. Employee morale drifts. Organizations exist precisely to impose order on human chaos, transforming uncertainty into predictability. Code is a tool in turning disorderly human ideas into orderly, deterministic systems.

This is why the most pressing challenge in AI today is not stateless problems – the generation of standalone text or images, which have largely been solved – but stateful ones. Stateful systems constantly observe their environment, react to changes, and adapt accordingly. Computer environments epitomise this complexity: folders move, files vanish, applications interact in intertwined ways. Real workflows run in a constantly shifting, nondeterministic environment while humans swiftly learn and adapt. Agents that work like humans need to adapt in real-world scenarios as well.

This brings us to a paradox: The most reliable agents might become unreliable in a chaotic environment.

Reliability vs flexibility

The real world is full of change and stateful systems: computers, startups, societies. In such chaos, valuable work requires adaptability. Those who survive are those who adapt quickly – an observation that exposes the limitations of pre-training, where a model is trained once on a large dataset and works well only in low-change environments.

In the chaos of the real world, we need to make agents that can correct their errors quickly and timely. This is the essence of continual learning. If an agent can detect mistakes and correct course fast enough, their instability becomes nearly invisible.

There is a catch in this method: the first explorer of the problem typically pays the price of failure. The practical solution, at least for now, is a compromise. When an agent detects something unusual, it should pause and flag the issue rather than blunder forward. A specialised system – or a human – can then diagnose and fix the problem before the agent resumes. This concentrates risk among experts rather than distributing it broadly.

This solution – humans handling high-value, specialized work while agents focus on more predictable, repetitive tasks, assumes a division of labor. But what if the assumption is wrong? The long-term vision has humans focusing on one-off judgements while AI handles repetitive, predictable tasks. But if this assumption proves wrong – if AI proves to be better at performing one-off tasks than humans, and worse at reliable work – we face an undesirable scenario: AI replacing elites in making high-value, one-off judgment calls while humans are relegated to repetitive labour.

Building autonomous computers doesn’t mean replacing humans. It means cooperation.

Free your hands from the computer. Download Simular today for free.

Try Simular