Every team that ships an agent platform eventually has the same bad weekend. A bad prompt, a runaway loop, a misconfigured retry — and suddenly the finance Slack channel is on fire. This guide is the set of patterns we recommend to make sure your team never has that weekend.

Pattern 1: per-tenant hard budgets

The baseline. Every tenant gets a daily budget. When they hit it, new requests return 402 Payment Required and in-flight requests terminate at the next step boundary.

from fastagentic.cost import RedisCostTracker

app = App(
    cost_tracker=RedisCostTracker(
        redis_url="redis://localhost",
        budget_per_tenant=Budget(
            daily=100.0,      # $100/day
            hourly=20.0,      # $20/hour burst cap
            currency="USD",
        ),
    ),
)

Two knobs, both hard. The hourly cap catches bugs the daily cap would only catch after several hours.

Pattern 2: per-user budgets inside tenants

Multi-tenant is rarely enough. Inside a tenant, you usually want each user to have their own cap — otherwise one user’s runaway agent drains the whole tenant’s budget in an hour.

@agent_endpoint(
    "/research",
    policies=[
        RateLimitPolicy(per_user="50/hour"),
        BudgetPolicy(per_user="10/day"),
    ],
)
async def research(query: str) -> str: ...

Pattern 3: per-endpoint budgets

Some endpoints are inherently more expensive than others. Give the expensive ones tighter caps:

@agent_endpoint(
    "/deep-research",  # hits GPT-4o + 20 tool calls
    policies=[BudgetPolicy(per_user="5/day", per_call_max=2.0)],
)
async def deep_research(query: str) -> str: ...

per_call_max=2.0 is the safety net: if a single call is projected to exceed $2, it halts. This catches runaway loops before they finish.

Pattern 4: step-level budget checks

In long LangGraph pipelines, you want to check the budget between steps, not just at the end. FastAgentic’s StepTracker does this automatically when a cost tracker is configured.

@agent_endpoint(
    "/pipeline",
    adapter=LangGraphAdapter(graph, track_steps=True),
    policies=[BudgetPolicy(per_call_max=5.0, check_per_step=True)],
)
async def pipeline(q: str) -> str: ...

Every node transition checks the cumulative cost. If the limit is hit, the run terminates cleanly (checkpoint saved, response streamed with a partial-failure marker) instead of charging forward.

Pattern 5: model routing by budget

Not every request needs the expensive model. Route requests to cheaper models when the user’s remaining budget is low:

from fastagentic.routing import budget_aware_router

router = budget_aware_router(
    default="anthropic:claude-sonnet-4-6",
    fallback_below={
        20.0: "anthropic:claude-haiku-4-5",
        5.0: "openai:gpt-4o-mini",
    },
)

agent = Agent(model=router, ...)

Users with lots of budget left get the good model. Users close to their cap get the cheap one. The degradation is graceful instead of cliff-edged.

Pattern 6: pre-flight cost estimation

Before a run starts, you can estimate its cost from prompt size, expected tool calls, and historical data:

estimate = await cost_tracker.estimate(
    prompt=query,
    expected_steps=graph.max_depth,
    model=agent.model_name,
)
if estimate > user.remaining_budget:
    raise HTTPException(402, f"Estimated cost ${estimate} exceeds remaining budget")

This turns out-of-budget errors from late failures into early, cheap rejections.

Pattern 7: cost attribution dashboards

You can’t control what you can’t see. FastAgentic emits cost events to your observability stack (Langfuse, Datadog, OTel) with tags for user, tenant, endpoint, model, and run ID. The first dashboard we build for every client has four charts:

Cost per tenant, last 24 hours.
Cost per endpoint, last 7 days.
Runaway detector: runs where cost > 3 standard deviations above median.
Model mix: % of spend on each model, trended weekly.

If you can’t see those four at a glance, you’re flying blind.

Pattern 8: kill switches

Sometimes the right answer is “turn the whole endpoint off.” FastAgentic has a feature-flag integration that lets you disable an endpoint, a tenant, or a user in Redis without a redeploy:

fastagentic policy disable-endpoint /deep-research --reason "investigating cost spike"

In-flight requests finish; new ones get 503. Turn it back on when the issue is understood.

The anti-patterns

Things we see teams do that don’t work:

Soft warnings without hard caps. “We’ll just send a Slack alert.” The alert arrives an hour after the damage.
Cost caps at the LLM provider dashboard. Providers rate-limit unpredictably and don’t tell your app until the 429 comes back.
Per-request retries with exponential backoff and no budget check. Each retry compounds cost. Always check budget before retry.
Cost tracking only at the end of a run. Runaway loops can spend the entire budget before the first result lands.
Shared budgets across dev and prod. Separate them. Dev will blow up; prod should not.

Minimum viable cost control

If you only have two hours, implement this much:

Per-tenant daily hard budget via RedisCostTracker.
per_call_max on every endpoint.
A Grafana (or Langfuse) panel showing cost-per-tenant for the last 24 hours.

That covers 90% of the incidents we see.

The longer game

Once the basics are in place, graduate to model routing, pre-flight estimation, and kill switches. They’re not critical on day one, but they’re the difference between a platform that survives its first viral customer and one that doesn’t.

Agent cost control: patterns that actually work