← All posts
Architecture9 min read2026-03-01

Multi-Agent Systems: The Hidden 4.8x Cost Multiplier

I ran the same task through a single agent and a 4-agent supervisor system. Same model, same tools, same query. The multi-agent system cost 4.8x more and produced half the output.

Here's what happened and why.

The setup

Task: write a market analysis of AI agencies in 2026, covering market size, key trends, major players, and cost challenges.

Single agent: one Claude Haiku with web search and calculator, running a ReAct loop.

Multi-agent system:

  • Supervisor: breaks down the task, coordinates, synthesizes at the end
  • Researcher: runs web searches in a ReAct loop
  • Writer: takes research output, writes the analysis
  • Critic: reviews the draft, verdict PASS or REVISE

Same model (Claude Haiku 4.5), same tools, same query.

The results

MetricMulti-AgentSingle AgentRatio
Total cost$0.085$0.0184.8x
API calls1142.8x
Input tokens45,1919,0465.0x
Output tokens8,0101,7374.6x
Output length2,754 chars6,028 chars0.5x

Multi-agent cost nearly 5x more and produced half the output.

Why: context duplication

The single agent kept everything in one growing conversation. Turn 1 added some tokens. Turn 2 added a bit more. Information accumulated in one place.

The multi-agent system passed context between agents. Every handoff meant the next agent received the full output of the previous one as new input. Those tokens got counted and billed again. And again.

Here's how the input token count grew across the pipeline:

Supervisor (plan):    239 input tokens   just the original task
Researcher:        36,734 input tokens   task + plan + 6 iterations of search results
Writer:               897 input tokens   plan + truncated research
Critic:             2,290 input tokens   task + research excerpt + full draft
Writer (revision):  2,681 input tokens   draft + feedback + research excerpt
Supervisor (synth): 2,350 input tokens   draft + review

The same information, the original task and the research plan, flowed through every agent. Every time it reached a new agent, it was re-tokenized and re-billed as input.

The researcher ate 57% of the budget

Here's the per-agent cost breakdown:

Agent              Cost     Share
Supervisor plan   $0.004    4.8%
Researcher        $0.049   57.2%
Writer draft      $0.011   13.1%
Critic            $0.005    5.6%
Writer revision   $0.011   12.7%
Supervisor synth  $0.006    6.6%

The researcher ran in a loop. 6 iterations, each carrying the accumulated results of all previous iterations. By iteration 6, it was processing the output of 5 previous searches on every new call. That's where the money went.

Any multi-agent system with a tool-using research agent is going to look like this. The research agent will always eat a disproportionate share.

Quality didn't compensate

You'd expect 4.8x more expensive to mean better output. It didn't.

The researcher hit its max iteration limit without synthesizing a clean brief. It kept searching but never stopped to organize what it found. The writer received a pile of raw search results and produced a rough draft.

The critic flagged it incomplete and requested a revision. After the revision, the supervisor still marked it "INCOMPLETE."

The single agent reached iteration 4, decided it had enough information, and wrote a complete 6,028-character analysis. It adapted. The pipeline couldn't.

When multi-agent actually makes sense

This isn't an argument against multi-agent systems. It's an argument for using them in the right situations.

Multi-agent makes sense when tasks are genuinely parallel. Three agents researching different market segments simultaneously, each with their own tools, running concurrently. The context duplication cost is real but you're getting actual parallelization. Time saved matters in some workflows more than token cost.

Multi-agent doesn't make sense for sequential tasks. If Agent B can't start until Agent A finishes, and Agent B needs Agent A's full output, you're paying the context duplication tax with no parallelization benefit. A well-prompted single agent will do the same job for a fraction of the cost.

Before building a multi-agent system, the question to ask is: can these tasks actually run at the same time? If yes, multi-agent might be worth it. If they're sequential by nature, a single agent with a structured prompt will almost certainly be cheaper and produce better output.

What this means for pricing agency projects

When you're quoting a client on an AI system that researches, writes, reviews, and delivers reports, the cost estimate matters.

The naive calculation: agents x cost per agent.

The real cost: that number, multiplied by a compounding factor for every context handoff in the pipeline.

In this experiment, input tokens were 5x higher in the multi-agent system despite both versions having the same tools and the same task. That multiplier scales with the number of agents and how much context each one passes downstream.

AgentQuote models this. When you describe a multi-agent system, the calculator applies a context duplication multiplier per agent, based on the measured 1.2x compounding factor from this experiment. That's why the high-end scenario for a 5-agent pipeline looks much more expensive than just "5 times single agent cost."

The practical takeaway

Sequential task where one piece of reasoning feeds the next: use a single agent.

Genuinely parallel workload where agents can run concurrently and specialization matters: multi-agent might be worth the cost.

Either way, budget the research or tool-using agent separately. It will always consume more than its proportional share.

AgentQuote estimates multi-agent system costs with context duplication built into the model. Try it here.

Ready to estimate your agent costs?

Describe your system, get a cost breakdown in 60 seconds. Free, no signup required.

Estimate Your System →