Back to blog
AI Agents
April 11, 20266 min readby Noomachy Team

The Real Cost of Running an AI Agent Platform

Everyone wants to build an AI agent. Few people calculate what it costs to actually run one in production. Here's a real breakdown from running Noomachy.

Cost Components

A production AI agent platform has roughly five cost buckets:

  1. LLM API calls (Claude, Gemini, etc.)
  2. Embeddings for semantic memory
  3. Vector storage / search (Vertex AI, Pinecone, or Firestore)
  4. Cloud functions / compute (Firebase, AWS Lambda, etc.)
  5. Database storage (Firestore, Postgres)

LLM costs dominate. Everything else is cheap by comparison.

LLM API Costs (2026)

For a typical agent conversation with tool use, expect:

  • Claude Sonnet 4: ~$0.05–0.30 per turn (depends on context length and tool calls)
  • Gemini 2.5 Flash: ~$0.001–0.01 per turn (30x cheaper)
  • GPT-4o: ~$0.05–0.20 per turn

A heavy user doing 50 conversations a day: ~$2–15/day on Claude, ~$0.10–0.50/day on Gemini.

Most users do far less. The median is 5–15 messages a day, costing ~$0.20–1.50.

Embedding Costs

Embeddings are nearly free at scale:

  • Vertex AI textembedding-gecko@003: $0.025 per 1K characters
  • A typical fact is ~100 chars, so ~$0.000003 per memory
  • 1000 memories per user costs about $0.003

You can ignore this cost.

Vector Storage and Search

For Noomachy, we store vectors in Firestore alongside the memory documents. No separate vector DB needed. Cost: included in the regular Firestore storage cost (~$0.18/GB/month).

If you scale beyond ~100K memories per user, you'd want a real vector index (Pinecone, Vertex AI Vector Search, pgvector). At those scales the cost is real but still small compared to LLM calls.

Cloud Functions

Each request to the agent runs a Cloud Function. With Firebase Functions Gen 2:

  • ~$0.0001 per invocation
  • ~$0.00001667 per CPU-second
  • Free tier covers ~125K invocations/month

For most users this is essentially free. Only at high scale does compute become noticeable.

Database Storage

Firestore charges:

  • $0.18/GB/month for storage
  • $0.06 per 100K reads
  • $0.18 per 100K writes

A user with 100 conversations and 500 memories: ~5MB total. Negligible.

The Actual Per-User Cost

Putting it all together for a typical Noomachy user (active, daily use):

  • LLM (mostly Claude): $1–5/month
  • Embeddings: $0.01/month
  • Functions: $0.10/month
  • Firestore: $0.05/month
  • Total: ~$1.50–5.50/month

For heavy users (50+ conversations/day): up to $50/month, almost entirely LLM.

How to Lower Costs

If you're running your own agent platform:

  1. Use smaller models for sub-tasks. Use Claude Sonnet for reasoning, but Gemini Flash or Haiku for fact extraction, intent classification, summarization.
  2. Cache aggressively. Embeddings, tool results, intermediate computations.
  3. Limit context. Don't pass every memory — use top-K vector search.
  4. Set per-user budget caps. Noomachy has a $5/month default cap on Gemini for free-tier users.
  5. Use prompt caching. Anthropic's prompt caching can cut token costs by 50–90% on repeated system prompts.

Pricing Implications

This is why most AI assistant products either:

  • Charge $20+/month (covering even heavy users)
  • Use Gemini-style cheap models exclusively
  • Limit usage with hard caps and tiers
  • Run as loss leaders to gather data for training

Noomachy's free tier uses Gemini Flash by default with a $5 monthly token cap. Pro tier ($29/month) unlocks Claude and removes caps. The economics work because most users are well below the heavy-user threshold.

Bottom Line

Running an AI agent is cheaper than people think (if you're frugal) and more expensive than people expect (if you're sloppy). The lever is what model you use and how much context you pass. Get those right and the rest doesn't matter.

Try Noomachy free →

#Cost#Infrastructure#LLM

Ready to try Noomachy?

Build AI agents with sovereign memory in minutes. Free tier, no credit card.

Get Started Free