← All posts
Cost Analysis10 min read2026-03-01

How Much Does an AI Agent Cost in 2026?

I spent the last month building agents from scratch, tracking every token, every API call, every retry. Before I started I had no real idea what these things cost at scale. Here's what I found.

How the billing actually works

Every LLM provider charges the same way: input tokens plus output tokens, priced separately per million.

The thing that catches most people off guard is that output tokens cost 2-5x more than input tokens. On Claude Haiku, input is $1 per million tokens, output is $5. On GPT-4o, input is $2.50, output is $10. So every time your agent writes a response, you're paying at the expensive rate for all of it.

The formula is just:

cost = (input_tokens / 1,000,000 x input_price) + (output_tokens / 1,000,000 x output_price)

Simple enough. The hard part is knowing how many tokens you're actually using.

Current model pricing

Here's what agencies are actually working with in 2026:

ModelInput/MTokOutput/MTok
Haiku 4.5$1.00$5.00
Sonnet 4.5$3.00$15.00
Opus 4.5$5.00$25.00
GPT-4o$2.50$10.00
GPT-4o Mini$0.15$0.60
DeepSeek V3$0.28$0.42

DeepSeek V3 is over 21x cheaper than Claude Sonnet for simple tasks. Most agencies I've talked to pick one model and never compare alternatives. That decision alone can mean 10x difference in running costs.

Why single-turn estimates don't tell you much

A one-turn exchange costs almost nothing. A few fractions of a cent. That's why demos look cheap, and why estimates based on demos are usually wrong.

The real cost is in multi-turn conversations. Every time your agent responds, it resends the entire conversation history as input tokens. Turn 1 reads 1 turn of context. Turn 2 reads 2 turns. Turn 3 reads 3. The growth is triangular, not linear.

A 10-turn conversation doesn't cost 10 times a single turn. It costs roughly 55 times a single turn.

For a chatbot handling 500 conversations a day, 8-10 turns each, this is where the money goes. Not the individual calls, the accumulation across the conversation.

The 200K token pricing tier

Anthropic and most providers have a long context pricing tier. On Claude Sonnet, if your input exceeds 200,000 tokens in a request, the price for every token in that request, input and output, switches to a higher rate. Sonnet goes from $3/$15 to $6/$22.50 per million.

One large document injected into context, one conversation thread that grew too long, one agent that accumulated tool results without truncating, that's enough to double the cost of that response. Most agencies don't know this threshold exists until they see an unexpected bill.

Thinking models and reasoning token costs

Extended thinking models (Claude's extended thinking, o1, o3) generate invisible reasoning tokens before writing the visible response. Those reasoning tokens are billed as output tokens, which is the expensive tier.

If a model thinks for 10,000 tokens then writes a 500-token response, you're billed for 10,500 output tokens. An agency running a thinking model on 200 conversations a day without a thinking budget cap will spend much more than they planned.

Every SDK that supports thinking models exposes a thinking budget parameter. Most people never set it.

When prompt caching actually helps

Prompt caching stores part of your prompt so the model doesn't re-read it from scratch every request. Cache reads cost about 10% of the base price, so the savings are real.

But there's a minimum threshold. Anthropic requires at least 1,024 tokens in the cached section before a cache is created. Smaller prompts won't cache at all.

In my testing, caching made a meaningful difference when:

  • The system prompt was large (5,000+ tokens)
  • There were many tool definitions (10+ tools adds up fast at 500 tokens per definition)
  • The model was expensive (Sonnet or Opus, where input is $3-5/MTok)
  • There were many calls per session (10+)

For a simple chatbot with a short system prompt on Haiku, caching saves maybe $0.002 total per session. Not the thing to optimize first.

Rough cost ranges at different scales

These are based on my experiments using Claude Haiku with a 5-turn average conversation, no tools:

100 conversations/day: Low: ~$1.50/month, High: ~$6/month

1,000 conversations/day: Low: ~$15/month, High: ~$60/month

10,000 conversations/day: Low: ~$150/month, High: ~$600/month

Add tool use, switch to Sonnet, or have longer conversations and these numbers jump significantly. A ReAct agent with web search running 10,000 conversations a day on Sonnet is a completely different calculation.

Where agencies go wrong with estimates

The pattern I keep seeing: estimate costs from a demo. Run 10 test conversations, divide by 10, multiply by expected volume. This misses tool calls, failure retries, context growth, and the difference between clean demo traffic and real production usage.

Production agents behave differently. Users write longer messages. Conversations go more turns. Tool calls fail and get retried. Context windows fill up in ways they didn't during testing.

By the time the real bill arrives, the client has already been quoted a fixed price.

That's what AgentQuote is for. Put realistic cost ranges in front of you before you commit to a number, not after.

AgentQuote estimates AI agent running costs before you quote a client. Try it here.

Ready to estimate your agent costs?

Describe your system, get a cost breakdown in 60 seconds. Free, no signup required.

Estimate Your System →