Vol. I · No. 2OddThesisApr · 2026

On autonomous systems and inflated vocabulary

Who Owns
the Loop?

Every chatbot is now an agent. Every pipeline claims autonomy. Almost none of them mean it.

§ Prologue

The year every button became a brain.

Sometime around 2024, a collective decision was made — not in any boardroom, more by the slow drift of press releases — that "assistant" was no longer sufficient. Assistants are passive. Assistants wait for you. What the industry was building, it announced, were agents: autonomous, purposeful, capable of taking action in the world on your behalf.

The word arrived in product launches, venture decks, and engineering blog posts with a frequency that should, by itself, trigger skepticism. A chatbot with three tools bolted on became an agent. A RAG pipeline with a retry loop became an agent. A form-filling bot became, remarkably, an autonomous agent. The market took note: investors poured $3.8 billion into AI agent startups in 2024 alone — nearly three times the year prior. The word was doing serious financial work.

Most of it was noise. But buried underneath was something real.

§ Part One

What the word actually means,

before marketing got to it.

The concept of an "agent" in AI predates the chatbot era by decades. Roboticists and AI researchers used it in a specific, technical sense: a system that perceives its environment, maintains some internal state, chooses an action, executes it, observes the result, and repeats. The loop. Not a one-shot transform — a continuous cycle of observation and response, running until a goal is met.

Three properties distinguish a genuine agent from a sophisticated prompt:

A goal, not a task.

A task is predefined. "Summarise this document." A goal is open. "Figure out why our churn spiked in Q3." The path to the answer is not specified — the agent must determine it. This is not a semantic distinction. It is the difference between a function call and a planning problem.

State-dependent action.

What the agent does at step twelve depends on what it discovered at step seven. Each action is chosen in response to observed state, not a pre-written script. The sequence cannot be predicted before execution begins.

The agent decides when it is done.

This is the property most often missing from systems that call themselves agents. If a human must review the output and issue the next prompt, the human is the outer loop — the agent is a step inside someone else's workflow, not an autonomous system.

By this definition, the vast majority of things currently marketed as "AI agents" are pipelines. They are prompt chains with conditional routing. They are useful — sometimes genuinely impressive — but the autonomy is a facade. A human designed every branch. A human decides when to re-run them.

§ Part Two

A spectrum nobody

wants to admit exists.

The industry prefers a binary: "AI agent" versus "just a chatbot." The binary is commercially convenient. The reality is a spectrum, and understanding where something sits on it matters enormously for knowing what it can and cannot do.

Static prompt

→

Input → Output. No memory, no loop, no tools. One shot.

Tool-augmented prompt

→

Can search, fetch, calculate. Still a single response cycle. You are the loop.

Proto-agent (ReAct loop)

→

Observe → reason → act, repeated. Adapts to intermediate results. Goal still human-defined, termination still fuzzy.

True agent

→

Owns the goal, the loop, and the stop condition. Acts in the world. Reports when done.

Consider what actually happens when you ask Claude to "write me an article, search the web for sources." The model searches, reads results, decides what's relevant, fetches a page for depth, synthesises a draft. There is a loop. State from step three informs step five. This is more than a prompt — it sits somewhere between tool-augmented and proto-agent on the spectrum.

But you set the goal. You read the output. You decide if it was good enough, and you issue the next prompt if not. The outer loop belongs to you. A true agent version of the same task would run autonomously: monitor a topic for a week, assess when there's enough new material, draft, self-critique, revise, publish — and surface to you only when it determines the piece is done, or when it's genuinely stuck.

Most people have not experienced the true agent version. Most things claiming to be agents are the proto-agent version on a good day.

§ Interlude

The test is not whether a system uses tools.
It is not whether it takes multiple steps.
It is one question.

"Who owns the loop?"

If a human decides what happens next after every step — the human owns the loop. The AI is a capable tool inside someone else's workflow. If the system decides what to do next, based on what it just observed, without asking permission — the system owns the loop. Everything else is detail.

A spreadsheet formula is not an agent. A chess engine is not an agent — its goal is specified externally on every move. A robotic warehouse picker that detects a misrouted package, reroutes it, logs the anomaly, and flags a pattern for review — without anyone requesting any of those actions — is an agent. The loop is its own.

§ Part Three

The places where

agents actually work.

Strip away the hype and a handful of categories remain where genuine agent behavior is happening and delivering results. What they share is not sophistication — it is the presence of clear, machine-readable feedback.

Coding agents

The most mature category. Claude Code, Devin, GitHub Copilot Workspace. You hand them a ticket or a bug description; they read the codebase, write code, run tests, observe failures, fix them, and iterate. The loop is genuine. What makes coding uniquely tractable is that the environment gives unambiguous feedback — a test either passes or it does not. There is no interpretation, no nuance, no "well, it kind of works." This is why coding agents outperform agents in almost every other domain: the signal is clean.

One nuance worth understanding: feedback does not require a formal test suite. The compiler is feedback. A runtime crash is feedback. TypeScript's type errors are feedback. When you build greenfield projects with an AI coding agent and no tests, the agent is not flying blind — it is reading error output, import failures, and runtime traces as its feedback signal. The loop runs on those. The moment the codebase grows past the agent's context window, or you change requirements mid-build, that feedback signal degrades.

Deep research agents

Perplexity Deep Research, OpenAI Deep Research, and their competitors. A question is posed; the agent determines what to search, reads sources, identifies gaps in what it found, searches again, and synthesises. The internal monologue — "I know X but still don't know Y, so I need to search for Z" — is genuine planning. These are closer to true agents than most things wearing the label.

Customer support and ops pipelines

Less exciting to describe, but the most widely deployed category in production today. An agent with access to a CRM, ticketing system, and knowledge base handles a customer issue end-to-end — looks up the account, decides whether it can resolve autonomously, takes action, escalates only when genuinely stuck. Verizon's deployment of Gemini-powered customer service agents covered 28,000 interactions and reduced call handling time measurably. The agent owns the loop for routine cases; a human owns it for everything else. That boundary is, in practice, where most enterprise agentic deployment actually lives.

§ Part Four

The honest ledger:

where agents start to break.

Roughly 40% of enterprise agent projects fail in production. The stat is from 2025 industry surveys; the underlying reason is almost always the same. The domain offered no clean feedback signal, the task had too many sequential steps, or errors at step two quietly propagated through twenty downstream decisions before anyone noticed. Here is what that looks like in practice.

The Failure Ledger

the ledger

— What agents need to succeed

Clean feedback: tests pass/fail, compilers throw errors
Short, bounded task horizons where mistakes stay local
Human in the middle-loop to catch wrong turns early
Codebase and context small enough to hold in working memory
A well-defined stop condition the agent can evaluate itself

+ What actually happens without it

Error compounding across 20+ sequential steps
Silent failure: confident wrong output, no crash, no trace
Context window degradation as session length grows
Specification drift mid-task as goals subtly shift
Cascading failures in multi-agent chains

Agents don't fail like software. They don't crash with a stack trace. They return a confident, well-formatted answer that is completely wrong — and the wrongness has been propagating since step three.

The compounding problem deserves emphasis. A traditional software bug is local. Fix the function, fix the bug. An agent error is not local — it is a decision that every subsequent decision was built on top of. A wrong assumption at step two in a thirty-step pipeline does not produce a wrong answer at step two. It produces a confidently wrong answer at step thirty, with no obvious trail back to the source.

Context window degradation is the second failure mode that practitioners rarely discuss openly. An agent running a long session carries its history in its context window. As that window fills, the model exhibits recency bias: it prioritises what it read recently over what it established at the beginning. A goal stated at the top of the session can quietly drift. The agent is still producing output — it simply no longer knows what problem it was originally solving.

Multi-agent architectures compound this further. An orchestrator spawns specialist agents — a researcher, a writer, a critic. Each runs its own loop. Each trusts the previous agent's output as ground truth. There are no checkpoints between them. A misunderstanding in the researcher flows cleanly into the writer's draft, gets critiqued by the critic in the wrong frame, and surfaces to the user as a coherent piece of confident nonsense.

40%

Agent projects

fail in production — inadequate feedback loops

33×

Enterprise software

with agentic AI: <1% in 2024 → 33% by 2028

$5.4B

Market in 2024

projected $50B by 2030, most of it still pipelines

§ Coda

The real agents are coming.
The word just arrived first.

The cynical read of all this is that "agent" is just "AI assistant" with better funding. There is truth in that. But the cynical read misses something: the underlying architecture — the sense-decide-act loop, state-dependent action, machine-owned termination — is real, and it does produce qualitatively different capabilities when the conditions are right. Coding agents running unattended in CI pipelines are not a gimmick. Research agents that synthesise across hundreds of sources in an hour are not a chatbot with better formatting.

The honest position is narrower than the marketing. Agents work when the feedback signal is clean and the task horizon is short. They compound errors when it is not. They are being deployed most successfully in domains — code, customer service routing, document processing — where prior software automation already made the environment legible. The genuinely hard part, the part that would make an agent autonomously run your marketing function or manage your engineering team, requires clean feedback in domains that are inherently ambiguous. Nobody has solved that.

What gets called an "agent" today is often a capable, useful, well-engineered pipeline. The market for those pipelines is real — $5.4 billion in 2024 and accelerating at 45% per year. None of those numbers are wrong. The vocabulary just arrived about five years before the capabilities it implies.

"The useful question, for anyone building with these systems today, is not whether the thing qualifies as an agent. It is simpler: in this system, when something goes wrong at step three, who finds out, and when?"