FastAgentic
Guide · LangGraph

Deploying LangGraph to production with FastAgentic

A practical, opinionated guide to taking a LangGraph pipeline from notebook to production: durable checkpoints, streaming, cost caps, and resumption after crashes.

Neul Labs ·
#langgraph#production#checkpoints#deployment

LangGraph is the best state-machine abstraction in the Python agent ecosystem. It is also, by design, not a deployment platform. Teams taking LangGraph to production usually discover this around week three, after their first process restart loses an hour of partial progress.

This guide walks through the deployment shape we actually recommend — the one we use in paid consulting engagements rescuing broken LangGraph pipelines.

The goal

By the end, you will have:

  • A LangGraph pipeline running behind a FastAgentic endpoint.
  • Node-level durable checkpoints so a crash at step 37 resumes from step 38.
  • Streaming SSE that emits intermediate node state to clients as it happens.
  • Per-tenant cost tracking with hard budget cut-offs.
  • An MCP surface so your pipeline is callable from Claude, Cursor, and other MCP clients.
  • OpenTelemetry spans covering every node.

All with ~20 lines of deployment code.

Prerequisites

pip install 'fastagentic[langgraph]' redis psycopg2-binary

Assume you already have a compiled LangGraph graph object called graph in my_agent/graph.py.

Step 1: wire the adapter

# my_agent/app.py
from fastagentic import App, agent_endpoint
from fastagentic.adapters import LangGraphAdapter
from fastagentic.checkpoints import PostgresCheckpointStore
from fastagentic.cost import RedisCostTracker

from my_agent.graph import graph

app = App(
    title="Research Agent",
    auth="oidc",
    checkpoint_store=PostgresCheckpointStore(url="postgres://..."),
    cost_tracker=RedisCostTracker(
        redis_url="redis://localhost",
        budget_per_tenant=100.0,
        currency="USD",
    ),
)

@agent_endpoint(
    "/research",
    adapter=LangGraphAdapter(graph, track_steps=True),
)
async def research(query: str) -> str:
    """Deep-research pipeline."""

That’s it. The endpoint is now:

  • Reachable at POST /research with streaming SSE.
  • Exposed as an MCP tool named research.
  • Advertised as an A2A skill.
  • Durable at the node level.
  • Budgeted per tenant.
  • Authenticated via OIDC.

Step 2: understand StepTracker

track_steps=True tells the LangGraph adapter to use FastAgentic’s StepTracker under the hood. Every node transition emits a checkpoint containing:

  • The node name.
  • The full graph state.
  • Cost deltas (tokens used, model, price).
  • A deterministic step ID.

When a run is resumed, the runtime skips to the last successful step and re-enters the graph from there. State is fully reconstructed.

Step 3: streaming intermediate state

Clients often want to see what the agent is thinking, not just the final answer. FastAgentic emits an SSE stream containing every node transition:

event: node_start
data: {"node": "plan", "step": 1}

event: node_end
data: {"node": "plan", "step": 1, "cost": 0.021}

event: node_start
data: {"node": "search", "step": 2}
...
event: final
data: {"result": "..."}

Frontend code can render a “thinking” timeline without any custom server work.

Step 4: resuming a run

If a process crashes mid-run, the next invocation with the same run_id picks up automatically:

# client-side
resp = await client.post("/research", json={"query": "...", "run_id": "abc-123"})

If abc-123 is already partially complete in the checkpoint store, execution resumes from the last completed node. If not, it starts fresh. Your code does not need to know the difference.

Step 5: cost caps with teeth

RedisCostTracker with budget_per_tenant=100.0 enforces a hard cut-off: once a tenant hits their daily budget, new requests return HTTP 402 (Payment Required) and in-flight requests terminate on the next step boundary. No more “we burned $40k while the on-call engineer was asleep” stories.

Budgets can be scoped per user, per tenant, per endpoint, or all three. See the docs for the full policy grammar.

Step 6: observability

OpenTelemetry spans are emitted per node and per LLM call. Wire the OTLP exporter:

from fastagentic.observability import setup_otel
setup_otel(service_name="research-agent", exporter="otlp://collector:4317")

Now your Datadog or Honeycomb dashboard shows per-node latency, tool call breakdowns, and cost attribution.

Step 7: deployment

FastAgentic is an ASGI app. Anywhere uvicorn runs, this runs:

  • Kubernetes (Helm chart in the docs)
  • Docker on CapRover, Fly.io, Render
  • AWS Lambda / Cloud Run with S3 checkpoint backend
  • Bare metal with systemd

For most teams we recommend starting with Docker on CapRover or Fly.io — it’s the shortest path from “works on my machine” to “works at 3am on Sunday.”

What you no longer have to maintain

  • A custom Celery queue for resumable state.
  • A bespoke FastAPI wrapper around .astream().
  • A per-project cost tracking middleware.
  • A hand-rolled OTel spanning layer for LangGraph internals.
  • An MCP server that duplicates your REST contracts.

That’s typically 1,000–2,000 lines of code per team. Delete it.

Common pitfalls

  • Mutating graph state outside of nodes. Checkpoints only see state changes inside nodes. Keep mutations local.
  • Non-idempotent side effects. As above — wrap them or prepare for weird bugs on resume.
  • Forgetting to pass run_id from the client. Without it, every request is a fresh run. Great for quick tests; useless for resumption.
  • Skipping cost caps in dev. Set them anyway. You’ll be grateful the first time a bad prompt accidentally loops.

When to get help

If your LangGraph pipeline is already broken in production — stalling, losing state, burning money — that’s a pattern we see weekly at Neul Labs. Our rescue engagements start with a one-week audit and usually end with a service your team can own again. Get in touch.

Need FastAPI, LangGraph, or agent platform expertise?

Neul Labs — the team behind FastAgentic — takes on a limited number of consulting engagements each quarter. We help teams ship agents to production, fix broken LangGraph pipelines, and design governance for multi-tenant LLM platforms.