agentforge

Backend
2026-04-26T20:21:10Z
TypeScriptNode.jsExpressBackendAI

agentforge is a small Express + TypeScript HTTP service that wraps the Anthropic SDK behind a clean REST/SSE interface. The endpoint of interest is POST /v1/chat: the server runs an agent loop over the model with tools, streams content_block_delta events out as Server-Sent Events, executes any tool_use blocks server-side, feeds results back as the next user turn, and stops when the model emits end_turn or hits the max-iteration cap.

The loop itself is implemented as an async generator that yields a typed AgentEvent (iteration_start, text_delta, tool_call, tool_result, usage, done), so the route layer is a thin pump from the generator to the SSE writer. The same generator drives the eval harness, which reuses it without HTTP. Tools are described in a tiny registry (fetch_url with private-IP guards, search_docs over a local fixture set), each carrying its own Zod input schema; invalid model output gets a structured error fed back to the model rather than a thrown exception, so Claude self-corrects on the next turn.

Prompt caching is wired in via cache_control: { type: 'ephemeral' } on the system prompt and the trailing tool definition. The cache hit/miss numbers are exposed at /metrics so cache hit rate is graphable, not aspirational. Conversation history is persisted in Postgres with raw SQL migrations (no ORM, mirroring tinybus). The whole service runs on a multi-stage distroless Docker image, deploys to Railway with a one-line railway.json, and ships with vitest + supertest integration tests for the env loader, the tool registry, and the HTTP surface.

Three design decisions are deliberately left as TODO blocks in the code with the trade-offs explained inline: sequential vs parallel tool execution, prompt-cache breakpoint strategy, and which agent-loop counters to expose. Those are the choices that depend on actual production traffic, not on what reads well in a tutorial.

  • POST /v1/chat streams typed AgentEvents over SSE: iteration_start, text_delta, tool_call, tool_result, usage, done
  • Agent loop implemented as an async generator. The same generator drives both the HTTP route and the eval harness
  • Tool registry with per-tool Zod input schemas; invalid output is fed back to the model as a structured error rather than thrown
  • Built-in tools: fetch_url (with private-IP / loopback guards) and search_docs (local fixture-backed knowledge base)
  • Prompt caching via cache_control: ephemeral on system prompt + tool definitions; cache hits visible at /metrics
  • Postgres persistence for sessions and messages with raw SQL migrations and a small migration runner, no ORM
  • Strict TypeScript (noUncheckedIndexedAccess, ESM, NodeNext) with Zod boundary validation and Pino structured logging
  • Eval harness exposed at POST /v1/evals/run: substring + tool-call expectations against fixture prompts
  • Multi-stage Dockerfile ending in distroless/nodejs22 with a non-root user; railway.json ships migrations on boot
  • Three deliberately-left TODO blocks for the design decisions that matter most: tool-execution policy, cache breakpoint strategy, agent metric set