How to Track LLM Costs by Team and Feature

Vivek Vaidya ·
llm cost-tracking observability metadata

Your AI bill is growing. Fast.

But when someone asks you to explain it, you open your OpenAI dashboard and see one number: total spend by model. Maybe you have a few API keys and can break it down that way. That’s it.

Anthropic is the same. Gemini too. The providers bill you by model, not by what you built with that model.

So when your CFO asks “which feature is driving this cost increase?” or your team lead asks “how much did that experiment we ran last week actually cost?”, you have no answer. You’re guessing.

This is the attribution problem. And a dashboard won’t fix it.

Why Aggregate Metrics Aren’t Enough

Most LLM observability tools give you aggregate spend. Total tokens this month. Daily request volume. Average cost per request. That’s useful for trend monitoring, but it doesn’t tell you anything about what to do differently.

To actually manage LLM costs, you need to answer questions at the feature level:

  • Which feature uses the most tokens?
  • Which team’s experiments are still running in production, forgotten?
  • Which model is overkill for its task, and which is doing heavy lifting it wasn’t designed for?
  • How much did that A/B test cost across both variants?

Answering these questions requires attribution at the request level, not the account level. You need to tag each request with context at the time you make it.

The Mechanism: Request-Level Metadata

Majordomo lets you attach metadata to any LLM request using custom HTTP headers. Any header starting with X-Majordomo- (except X-Majordomo-Key and X-Majordomo-Provider, which are reserved) gets captured and stored with the request log.

curl http://localhost:7680/v1/messages \
  -H "Authorization: Bearer $ANTHROPIC_API_KEY" \
  -H "X-Majordomo-Key: $MAJORDOMO_KEY" \
  -H "X-Majordomo-Feature: document-classification" \
  -H "X-Majordomo-Team: ml-infra" \
  -H "X-Majordomo-Environment: production" \
  -d '{"model": "claude-sonnet-4-20250514", "messages": [...]}'

That’s it. One header per dimension you want to track. No SDK changes, no code refactoring. If you’re already calling the API, you add headers.

The same thing works with the Python client:

from majordomo_llm import get_llm_instance

llm = get_llm_instance("anthropic", "claude-sonnet-4-20250514")
response = await llm.get_response(
    prompt,
    metadata={
        "feature": "document-classification",
        "team": "ml-infra",
        "environment": "production",
        "experiment_id": "exp-2024-04-a",
    }
)

And with the OpenAI SDK pointed at the gateway:

from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="http://localhost:7680/v1",
    api_key=openai_api_key,
    default_headers={
        "X-Majordomo-Key": majordomo_key,
        "X-Majordomo-Feature": "summarization",
        "X-Majordomo-Team": "product",
    }
)

What to Tag and How to Think About It

The dimensions you tag should match the questions you want to answer. A few patterns that work well in practice:

Feature name. The most useful dimension. Tag every LLM call with the product feature it belongs to. document-classification, chat-assistant, email-drafting, code-review. This one dimension alone transforms your cost dashboard from “total spend” to “per-feature spend.”

Team. If multiple teams are using the same LLM infrastructure, tag by team. This gives engineering leads visibility into their own usage and makes cost allocation across the org straightforward.

Environment. production, staging, development. You’d be surprised how much cost accumulates in staging environments that nobody is cleaning up.

Experiment or experiment ID. When you run A/B tests or model experiments, tag them. You’ll want to know exactly what that test cost when you’re deciding whether to ship it.

User tier or user ID. If your product has different tiers (free vs. paid), tagging by user tier lets you understand your unit economics per customer segment. Are free users consuming disproportionate LLM resources? This answers it.

You don’t have to tag everything at once. Start with feature and environment. That alone will give you more insight than most teams have.

Activating Metadata Keys

Once you start sending metadata headers, Majordomo captures them with every request. To filter and slice by a dimension in the dashboard, you mark it as an active metadata key for that API key. Active keys get GIN-indexed in Postgres, which makes aggregation and filtering fast even across millions of requests.

You can manage metadata keys from the Settings section in the Majordomo dashboard, or via the API. Once a key is active, it shows up as a filterable dimension across your usage views.

This design is intentional. You might send a dozen metadata headers for debugging purposes. You only pay the indexing cost for the dimensions you actually want to query.

What the Dashboard Shows You

Once requests are tagged and metadata keys are active, the usage dashboard lets you slice costs along those dimensions. Filter by feature and you see total spend, token breakdown, model distribution, and daily cost trend for that feature alone. Filter by team and you see team-level attribution. Layer on a date range and you can scope it to a sprint, a month, or the duration of a specific experiment.

The cost numbers come from real-time pricing data, refreshed hourly. You’re not working with estimates.

This is the difference between knowing your AI bill and understanding it. One tells you the number. The other tells you what to do.

A Practical Starting Point

If you’re already calling OpenAI, Anthropic, or Gemini directly, the migration is three steps:

  1. Deploy the Majordomo gateway (Docker Compose gets you running in under five minutes)
  2. Point your API calls at the gateway instead of the provider
  3. Add X-Majordomo-Feature and X-Majordomo-Environment headers

You’ll have per-feature cost data in your dashboard within the first hour of traffic.

From there, add dimensions as you need them. Tag experiments when you run them. Add user tier attribution when unit economics become important. The schema grows with your questions.


The code examples on this site show the full integration patterns for curl, Python, and the OpenAI SDK. The deploy guide gets the gateway running in your environment. And the getting started docs walk through the full setup including metadata key activation.