Harness Engineering: 4 Levers to Diagnose Any AI Agent
Most agent failures aren’t model failures. They’re harness failures.
I break down the four levers I use to diagnose agents in practice: context, tools, loop, and governance. If an agent grabs the wrong information, calls the wrong tool, loops too long, or stops too early, this framework tells you where the harness failed and what to fix first.
Building an AI agent?
I help teams design and ship agentic systems — from architecture to production.
See how I can help
Stop Letting AI Agents Run the Whole Workflow
One inbox agent should not classify, research, score, route, and draft replies in one loose loop.

Building Approval Gates AI Agents Can't Route Around
How to wire human-in-the-loop on tool calls — and why system prompt instructions like "always ask before sending" don't actually hold.

Your AI Assistant Doesn't Need a Bigger Model. It Needs Colleagues
The multi-agent supervisor pattern in Mastra: eight specialist agents on one local LLM, one supervisor, structural trust boundaries — using TypeScript.

The Quality Loop Your AI Agent Is Missing (Evals + Tracing)
Add an LLM-as-judge scorer to a Mastra agent, catch a fabricated action item your tests would never flag, and fix the prompt — no custom infra.
Get new videos and posts by email
Weekly videos on AI engineering, plus deeper dives in the newsletter.
Occasional emails, no fluff.