Production Harnesses — observability, recovery, the billfeatured
Agents in production fail in ways single-prompt features do not. They burn money in their sleep, get stuck in tool-call loops, and wake the on-call at 3am because a context window blew up on input nobody anticipated. Twelve lessons on the operational layer that makes autonomous systems shippable.
Lessons
Observability
02Tracing — the span-per-turn discipline7 min03Structured eval logging — turning production into a regression source7 minRecovery and recourse
05Kill switches and circuit breakers — the off-button you hope you never use7 min06Recovery and checkpoints — resuming work, not restarting it7 minObservability
07Replay — debugging in slow motion8 minReference implementations
08The Sentry + Checkly + Playwright stack — PL's working production observability8 minOperations
09On-call for agents — the 3am playbook8 min10Postmortems as regression tests — every incident becomes an eval8 minReference implementations
11The Ostronaut batch pipeline — a worked example of a long-running agentic system8 min