<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Claude-Code on Damian Galarza | Software Engineering &amp; AI Consulting</title><link>https://www.damiangalarza.com/tags/claude-code/</link><description>Recent posts from Damian Galarza | Software Engineering &amp; AI Consulting</description><generator>Hugo</generator><language>en-us</language><managingEditor>Damian Galarza</managingEditor><atom:link href="https://www.damiangalarza.com/tags/claude-code/feed.xml" rel="self" type="application/rss+xml"/><item><title>Claude Code Routines: Auto-Maintaining PR Evolution Numbers</title><link>https://www.damiangalarza.com/posts/2026-05-06-claude-code-routine-auto-maintain-pr-evolution-numbers/</link><pubDate>Wed, 06 May 2026 00:00:00 -0400</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/posts/2026-05-06-claude-code-routine-auto-maintain-pr-evolution-numbers/</guid><description>A backend PR kept colliding with other merges on database evolution numbers. Four manual rebases later, I described the problem to Claude and let it write a routine to handle the rest.</description><content:encoded><![CDATA[<p>I had a backend PR open all week, waiting for a scheduled maintenance window before it could merge. The PR adds a single database evolution, a SQL migration that creates an index needed before the deploy.</p>
<p>Play Framework evolutions are numbered sequentially: 1, 2, 3, each a SQL file. When you add a new evolution, you take the next number. The problem is that every other PR merging to master that week was also adding evolutions. Each time one landed, it took the number mine was using.</p>
<p>Rebase. Renumber. Push. Wait. Another PR merges. Rebase again.</p>
<p>There had to be a better way.</p>
<h2 id="the-ask">The ask</h2>
<p>The fix wasn&rsquo;t complicated. Check if the evolution number on my branch conflicts with what&rsquo;s on master. If it does, renumber to the next available slot, rerun code generation, and force-push. The steps were mechanical. Doing them four times in a week was the problem. This felt like a great fit for Claude Code&rsquo;s recently released feature: <a href="https://code.claude.com/docs/en/routines">routines</a>.</p>
<p>So I described it to Claude in one sentence.</p>
<blockquote>
<p>Can you monitor this daily for breaking changes to the PR and keep it up to date with master? Resolving any issues with the evolution if the number gets taken again and updating the PR accordingly.</p></blockquote>
<p>That&rsquo;s the entire prompt. No pseudo-code. No step-by-step instructions. Just the constraint and the desired outcome.</p>
<h2 id="what-claude-generated">What Claude generated</h2>
<p>Claude wrote a <a href="https://code.claude.com/docs/en/routines">routine</a> to handle the full workflow. A routine is a saved prompt that runs on a schedule against a repository. Routines can run on Anthropic&rsquo;s cloud infrastructure or locally on your machine. I used a local routine for this one.</p>
<p>Here&rsquo;s the generated routine, sanitized and condensed. This is what Claude produced from the one-sentence ask above.</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>Monitor PR #1234 on org/repo and keep the branch in a
</span></span><span style="display:flex;"><span>mergeable state with master. The PR adds one database
</span></span><span style="display:flex;"><span>evolution. Evolution-number collisions with other PRs
</span></span><span style="display:flex;"><span>merging to master are the recurring issue.
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>## Workflow
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>1. Check PR state via `gh pr view` — if not OPEN, stop.
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>2. Find our evolution file by grepping for the index
</span></span><span style="display:flex;"><span>   identifier in the evolutions directory.
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>3. Check for collision — does master already have a file
</span></span><span style="display:flex;"><span>   at our evolution number?
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>4. If no collision and branch is current with master,
</span></span><span style="display:flex;"><span>   stop silently. No &#34;all clear&#34; comments.
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>5. If collision: capture our SQL content, find the highest
</span></span><span style="display:flex;"><span>   evolution number on master, renumber to master_max + 1,
</span></span><span style="display:flex;"><span>   reset to master, write the file at the new number.
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>6. Rerun sbt evolutions and code generation. If sbt fails
</span></span><span style="display:flex;"><span>   (local Postgres not reachable), proceed but flag it.
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>7. Commit, force-push with lease, update the PR body to
</span></span><span style="display:flex;"><span>   reflect the new evolution number.
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>8. Post a PR comment summarizing what changed: which
</span></span><span style="display:flex;"><span>   master PR landed at the old number, the rename from
</span></span><span style="display:flex;"><span>   N.sql to M.sql, and anything needing manual follow-up.
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>## Constraints
</span></span><span style="display:flex;"><span>- Don&#39;t touch other PRs. Don&#39;t merge this one.
</span></span><span style="display:flex;"><span>- Force-push with lease is fine on this feature branch.
</span></span><span style="display:flex;"><span>- Only comment when you take action or something needs
</span></span><span style="display:flex;"><span>  manual attention.
</span></span><span style="display:flex;"><span>- Idempotent: running twice with no upstream changes
</span></span><span style="display:flex;"><span>  should do nothing the second time.
</span></span></code></pre></div><p>In short: check the PR state, find the evolution file, detect whether master already has a file at the same number, and if so, renumber to the next available slot, rerun code generation, force-push, and comment on the PR with a summary. If nothing changed, do nothing and stay quiet.</p>
<p>Two things stand out.</p>
<p>First, the constraints section. I didn&rsquo;t specify most of those. Claude inferred the boundaries from the problem description: don&rsquo;t merge, don&rsquo;t touch other PRs, be idempotent, stay quiet when there&rsquo;s nothing to do. Those are the constraints an experienced developer would set. The routine arrived with them already in place.</p>
<p>Second, the idempotency requirement. A routine that runs daily needs to be safe to run twice. Claude treated that as a given rather than something I needed to spell out.</p>
<p>The routine handled four more renumbers that week without me touching it. Each morning it checked, and on the days a collision had occurred overnight, it renumbered, pushed, and left a comment on the PR explaining what changed.</p>
<h2 id="the-mental-model">The mental model</h2>
<p>The pattern here isn&rsquo;t specific to Play evolutions or database migrations. It applies to any mechanical maintenance task on a long-lived PR. Describe the constraint, not the steps. I didn&rsquo;t write the bash commands or the git operations. I described the problem (evolution numbers collide when other PRs merge) and the desired outcome (keep my PR mergeable). Claude filled in the implementation because the steps follow directly from the constraint.</p>
<p>This is the same dynamic that makes Claude Code effective for coding tasks: the more precisely you describe what you need and why, the better the output. The difference with routines is that the automation keeps running until the PR merges.</p>
<p>If you&rsquo;re babysitting a PR this week, describe the problem. Let the agent write the automation.</p>
<hr>
<p><em>If you&rsquo;re building AI into your engineering workflows and want to move faster without the trial-and-error, <a href="/services/ai-engineering/">let&rsquo;s talk about how I can help</a>.</em></p>
<h2 id="further-reading">Further reading</h2>
<ul>
<li><a href="https://code.claude.com/docs/en/routines">Claude Code Routines documentation</a> for setup and scheduling options</li>
<li><a href="/posts/how-i-use-claude-code/">How I Use Claude Code</a> for more on integrating Claude into daily development work</li>
</ul>
]]></content:encoded></item><item><title>Claude Opus 4/7 + Claude Code: 7 Practical Tips for Maximizing Extended Context</title><link>https://www.damiangalarza.com/posts/2026-04-30-claude-opus-4-7-claude-code-tips-extended-context/</link><pubDate>Thu, 30 Apr 2026 00:00:00 -0400</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/posts/2026-04-30-claude-opus-4-7-claude-code-tips-extended-context/</guid><description>Practical tips for getting the most from Claude Opus 4.7's 1M context window in Claude Code. Effort levels, proactive compaction, subagent delegation, and session management from daily production use.</description><content:encoded><![CDATA[<p>Claude Opus 4.7 shipped with a 1M token context window. That&rsquo;s five times what Sonnet 4.5 offers. However, this doesn&rsquo;t mean we should no longer be careful with our context window.</p>
<p>The lost-in-the-middle problem doesn&rsquo;t disappear at 1M tokens. Content in the center of the window still gets less attention than content at the beginning and end. Opus 4.7 uses a new tokenizer that improves model performance, but it also means files you read consume context in subtly different ratios than before. Anthropic&rsquo;s docs note the new tokenizer can use up to 35% more tokens per equivalent input compared to previous models. And adaptive thinking, now the only supported thinking mode in 4.7 (fixed budgets are removed), consumes context dynamically. The model thinks longer on harder problems and shorter on easy ones. That thinking counts against your window.</p>
<p>If you&rsquo;re coming from my earlier post on <a href="/posts/2025-12-08-understanding-claude-code-context-window/">Understanding Claude Code&rsquo;s Context Window</a>, everything there still applies. The fundamentals haven&rsquo;t changed. What has changed is the ceiling, and the set of controls available to you.</p>
<p>Here are seven workflow adjustments I&rsquo;ve made since Opus 4.7 dropped. Each one addresses a specific constraint I hit in daily production use.</p>
<h2 id="1-front-load-context-in-your-first-turn">1. Front-Load Context in Your First Turn</h2>
<p>One of the big changes from Opus 4.6 to Opus 4.7 is that it no longer reads between the lines. Opus 4.6 was better at taking a vague prompt and &ldquo;figuring it out&rdquo;. Opus 4.7, however, no longer does this. You need to provide good context to the model to achieve good results. The first message in the session anchors everything that follows.</p>
<p>Structure your first turn to include three things: what you want and why, which files or areas of the codebase are relevant, and what &ldquo;done&rdquo; looks like.</p>
<p>Here&rsquo;s an example. Instead of this:</p>
<pre tabindex="0"><code>Add rate limiting to the API
</code></pre><p>Try this:</p>
<pre tabindex="0"><code>Add rate limiting to the webhook ingestion endpoint in
packages/gateway/src/routes/webhooks.ts. We&#39;re getting
hammered by a misbehaving integration that sends duplicate
events. Use the existing Redis connection in src/lib/redis.ts.
Rate limit by client IP, 100 requests per minute. Add tests
in __tests__/webhooks.test.ts. Don&#39;t change the event
processing logic in src/lib/event-handler.ts.
</code></pre><p>The second version tells Opus 4.7 exactly what to touch, why, and what to leave alone. You define the &ldquo;what&rdquo; and the constraints. Let the model propose the &ldquo;how.&rdquo;</p>
<p>One thing to watch for: don&rsquo;t turn your first message into a specification document. If you find yourself writing more than a paragraph or two, you&rsquo;re probably trying to control implementation details that the model should decide. Name the files, the constraints, and the definition of done. Stop there.</p>
<h2 id="2-switch-effort-levels-mid-session">2. Switch Effort Levels Mid-Session</h2>
<p>Thinking tokens count against your context window. A single <code>xhigh</code> response on a complex problem can use significantly more tokens than the same question at <code>high</code>. Over the course of a session, this adds up fast.</p>
<p>Opus 4.7 introduced <code>xhigh</code> effort and replaced the old fixed thinking budgets with adaptive thinking. At <code>xhigh</code>, the model almost always engages deep reasoning on complex work and skips thinking on simpler tasks. That&rsquo;s useful for architecture decisions, complex debugging, and multi-file refactors. It&rsquo;s overkill for renaming a variable across twenty files.</p>
<p>Here&rsquo;s how I handle it. I start sessions at <code>xhigh</code> for the initial planning and implementation work. When I shift to mechanical tasks, I drop the effort level:</p>
<pre tabindex="0"><code>/effort high
</code></pre><p>Rename the files, run the migration, update the imports. Then when I need deep analysis again:</p>
<pre tabindex="0"><code>/effort xhigh
</code></pre><p>In practice: you spend the first part of a session at <code>xhigh</code> implementing a feature, then need to update some test fixtures and rename a few constants. Drop to <code>high</code> or even <code>medium</code> for that work. When you&rsquo;re ready to debug a failing integration test, go back to <code>xhigh</code>.</p>
<p>The gotcha here is context switching cost. Don&rsquo;t toggle effort every other message. Batch your mechanical tasks together and run them at a lower effort level in one block. Then switch back for the next piece of deep work.</p>
<h2 id="3-compact-at-60-not-when-you-see-a-warning">3. Compact at 60%, Not When You See a Warning</h2>
<p>Autocompact triggers when your context window is nearly full. By the time that happens with a 1M window, you&rsquo;ve been running with degraded output quality for a while. The lost-in-the-middle effect doesn&rsquo;t wait for you to run out of room. It starts affecting responses well before you hit the ceiling.</p>
<p>My rule of thumb: check <code>/context</code> periodically and compact when you hit around 60%. That sounds like a lot to throw away, but consider the flip side. You still have 400K tokens after compacting, which is twice the entire Sonnet 4.5 window.</p>
<p>Here&rsquo;s what <code>/context</code> output looks like in a session approaching that threshold:</p>
<pre tabindex="0"><code>Context Usage
⛁ ⛀ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁   claude-opus-4-7 · 610k/1000k tokens (61%)
</code></pre><p>After a proactive compact:</p>
<pre tabindex="0"><code>Context Usage
⛁ ⛀ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁   claude-opus-4-7 · 85k/1000k tokens (8.5%)
</code></pre><p>That&rsquo;s a fresh start with all the important decisions preserved. Much better than letting autocompact fire at capacity and losing coherence.</p>
<p>The trade-off with early compaction is that you lose conversational nuance. Specific phrasings, detailed file contents, and turn-by-turn reasoning all get compressed into a summary. This is why Tip 4 exists.</p>
<h2 id="4-steer-your-compaction">4. Steer Your Compaction</h2>
<p>Running <code>/compact</code> without guidance lets the model decide what to keep and what to drop. This works reasonably well for short sessions, but in a long session with multiple decisions, the model often drops specifics that matter for the next phase of work.</p>
<p>Always pass steering instructions when you compact. Name the topics and the decisions you need preserved.</p>
<p>Here are three examples from real sessions:</p>
<pre tabindex="0"><code>/compact Preserve the auth refactor decisions: we chose
JWT with rotating refresh tokens over session cookies,
the token service is in src/lib/auth/tokens.ts, and
the migration adds a refresh_tokens table.
</code></pre><pre tabindex="0"><code>/compact Keep the schema changes we made to the proposals
table (added status enum, soft delete columns, and the
client_id foreign key). Preserve the repo pattern decision
from packages/shared/src/db/repos/proposals.ts.
</code></pre><pre tabindex="0"><code>/compact We&#39;re moving to phase 2 of the API implementation.
Preserve the route structure decisions (REST for CRUD,
webhooks for async events) and the middleware chain order.
Drop the debugging of the TypeScript config issues.
</code></pre><p>Keep your steering to two or three sentences. Name the topics, not every detail. The model will fill in the specifics from the conversation history. You are giving it a priority list, not writing the summary yourself.</p>
<h2 id="5-use-subagents-for-context-isolation">5. Use Subagents for Context Isolation</h2>
<p>I covered subagents in detail in my <a href="/posts/2025-12-08-understanding-claude-code-context-window/">context window post</a>, but Opus 4.7 shifts the default behavior. In my experience, Opus 4.7 spawns fewer subagents on its own compared to earlier models (Anthropic&rsquo;s release notes confirm this as a deliberate behavior change). It&rsquo;s more inclined to do work inline, which means exploration output that used to be isolated now accumulates in your main context.</p>
<p>That&rsquo;s fine for focused tasks. It becomes a problem when you need to explore a large area of the codebase or review a significant diff. The fix: explicitly request subagent delegation.</p>
<p>The key is scoping what comes back. Instead of:</p>
<pre tabindex="0"><code>Review the changes on this branch
</code></pre><p>Try:</p>
<pre tabindex="0"><code>Have a subagent review the changes on this branch against
main. Report back: any bugs, any missing test coverage,
and any patterns that don&#39;t match our existing conventions.
Don&#39;t include the full diff in the report.
</code></pre><p>Good candidates for subagent delegation:</p>
<ul>
<li><strong>Code reviews:</strong> The subagent reads every changed file, but your main context only gets the summary.</li>
<li><strong>Codebase exploration:</strong> &ldquo;Have a subagent map out how the notification system works across packages/gateway and packages/agents.&rdquo;</li>
<li><strong>Test analysis:</strong> &ldquo;Spawn a subagent to check which tests cover the payment flow and identify gaps.&rdquo;</li>
<li><strong>Pattern audits:</strong> &ldquo;Use a subagent to find all places we handle errors in route handlers and check for consistency.&rdquo;</li>
</ul>
<p>The gotcha with subagents is that they don&rsquo;t share your conversation history. If you made a decision earlier in the session that affects how the subagent should evaluate something, include that decision in the delegation prompt. The subagent starts fresh.</p>
<h2 id="6-use-rewind-to-recover-from-failed-approaches">6. Use Rewind to Recover from Failed Approaches</h2>
<p>Every failed approach leaves artifacts in your context: the wrong implementation, the correction, the explanation of why it was wrong. With Opus 4.7&rsquo;s literal instruction following, this creates a real problem. The model may anchor on parts of a failed attempt even after you have corrected course, because that failed code is still in the conversation history.</p>
<p>The <code>/rewind</code> command (or double-tap Escape) rolls back to a previous point in the conversation. This removes the failed approach from context entirely, as if it never happened.</p>
<p>Here&rsquo;s when to use rewind versus inline correction:</p>
<p><strong>Rewind</strong> when the approach is fundamentally wrong. You asked for a webhook handler and got a giant switch statement, but your codebase uses an event routing pattern. Correcting inline means the model has both patterns in context and may blend them.</p>
<p><strong>Correct inline</strong> when the details need adjustment. The approach is right but a method name is wrong, or it missed an edge case. The cost of the correction in context is low, and the model benefits from seeing the refinement.</p>
<p>A practical example: I asked Claude to implement a notification dispatch system. The first attempt built a synchronous pipeline. My codebase uses BullMQ for async job processing. Rather than explaining why synchronous was wrong and asking it to redo the work, which would leave both approaches in context, I rewound and rephrased:</p>
<pre tabindex="0"><code>Implement notification dispatch using our existing BullMQ
job infrastructure in packages/agents/src/lib/queue.ts.
Each notification type gets its own job processor.
Follow the pattern in the heartbeat-runner for job setup.
</code></pre><p>Clean context. Clear direction. No conflicting signals.</p>
<p>One warning: rewind is destructive. If the failed approach contained useful insights (it identified the right files to modify, or surfaced a constraint you hadn&rsquo;t considered), note those before rewinding. You can include them in your rephrased prompt.</p>
<h2 id="7-know-when-to-clear-compact-or-continue">7. Know When to Clear, Compact, or Continue</h2>
<p>Quality degrades gradually in long sessions. You won&rsquo;t see a cliff. Responses get slightly less precise, slightly more generic, slightly more likely to miss constraints you established earlier. A 1M window means sessions can run much longer, which makes the decision of when to stop harder, not easier.</p>
<p>Here&rsquo;s the decision framework I use:</p>
<p><strong>Continue</strong> when you&rsquo;re mid-task, below 60% context usage, and working on a single coherent thread. The model has strong recall of recent decisions and the work is flowing.</p>
<p><strong>Compact</strong> when you&rsquo;ve finished a phase and are starting the next one. You need the architectural decisions but not the turn-by-turn implementation details. This is where Tip 4&rsquo;s steering instructions matter most.</p>
<p><strong>Clear</strong> when the next task is unrelated to what you&rsquo;ve been doing. Also clear when the model starts repeating itself, when you&rsquo;ve already compacted multiple times in the session, or when you&rsquo;ve persisted your plan externally (in a TODO file, a Linear issue, or a CLAUDE.md update).</p>
<p><strong>Start a new session</strong> when you need different MCP servers, when you&rsquo;re switching to a different branch, or when you&rsquo;re doing parallel worktree work. Each worktree should get its own session. I covered why in <a href="/posts/2026-03-10-extending-claude-code-worktrees-for-true-database-isolation/">Extending Claude Code with Worktrees for True Database Isolation</a>.</p>
<p>The full session lifecycle follows a natural arc. Start with a strong first prompt (Tip 1) at xhigh effort (Tip 2). During the working phase, delegate exploration to subagents (Tip 5) and rewind failed approaches (Tip 6). When you hit around 60% context, compact proactively (Tip 3) with steering instructions (Tip 4). Then decide whether to continue, clear, or start fresh (Tip 7).</p>
<pre tabindex="0"><code>Session Start
  ├── Tip 1: Front-load context in first turn
  ├── Tip 2: xhigh for deep work, high/medium for mechanical
  │
  │   [Working...]
  │
  ├── Tip 5: Delegate exploration to subagents
  ├── Tip 6: Rewind failed approaches
  │
  │   [~60% context used]
  │
  ├── Tip 3: Proactive /compact
  ├── Tip 4: Steer the compaction
  │
  │   [Continue or...]
  │
  └── Tip 7: Clear / New session when needed
</code></pre><h2 id="the-mental-model">The Mental Model</h2>
<p>The 1M context window isn&rsquo;t five times more room. It&rsquo;s five times more rope.</p>
<p>With a 200K window, context pressure forced discipline. You had to be deliberate about what went into the window because you would run out. With 1M tokens, poor habits go unnoticed much longer before the consequences show up. That makes discipline harder, not easier.</p>
<p>The one principle behind all seven tips: active context management beats passive accumulation. Front-load your intent. Control your effort levels. Compact before you need to. Steer the compaction. Isolate expensive exploration. Remove dead ends. Know when to stop.</p>
<p>These aren&rsquo;t theoretical suggestions. They&rsquo;re the adjustments I&rsquo;ve made in my own workflow over the past week of daily Opus 4.7 usage. It rewards precision and punishes ambiguity. Give it clear context, and it delivers.</p>
<blockquote>
<p>If this post was the explanation, the cheat sheet is the reference.
Two sides: token costs for common MCPs on one, the <code>/clear</code> /
<code>/compact</code> / subagent decision tree on the other.</p>
<p><a href="/context-window-cheat-sheet/">Get the Context Window Cheat Sheet →</a></p></blockquote>
<h2 id="further-reading">Further Reading</h2>
<ul>
<li><a href="/posts/2025-12-08-understanding-claude-code-context-window/">Understanding Claude Code&rsquo;s Context Window</a></li>
<li><a href="/posts/2025-11-25-how-i-use-claude-code/">How I Use Claude Code: My Complete Development Workflow</a></li>
<li><a href="/posts/2026-03-10-extending-claude-code-worktrees-for-true-database-isolation/">Extending Claude Code with Worktrees for True Database Isolation</a></li>
<li><a href="https://claude.com/blog/best-practices-for-using-claude-opus-4-7-with-claude-code">Anthropic: Best practices for using Claude Opus 4.7 with Claude Code</a></li>
</ul>
]]></content:encoded></item><item><title>Shrinking a Production Prompt by 28% With Autonomous Optimization</title><link>https://www.damiangalarza.com/posts/2026-04-06-autonomous-optimization-loops-with-autoresearch/</link><pubDate>Mon, 06 Apr 2026 00:00:00 -0400</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/posts/2026-04-06-autonomous-optimization-loops-with-autoresearch/</guid><description>How I used autoresearch to run 65 autonomous prompt optimization iterations on a production LLM agent, cutting it 28% while retaining 98% output quality.</description><content:encoded><![CDATA[<p>Every token in a production LLM prompt costs you latency, money, and <a href="/posts/understanding-claude-code-context-window/">context window</a> space. An agent I&rsquo;ve been building takes around 170 input categories and produces a detailed structured matrix as output. The system prompt includes a 421-line reference matrix as a few-shot example gallery so the model knows the expected output patterns.</p>
<p>The question was concrete. How much of this reference data does the model actually need? I used <a href="https://github.com/uditgoenka/autoresearch">uditgoenka/autoresearch</a>, a Claude Code skill based on <a href="https://github.com/karpathy/autoresearch">Andrej Karpathy&rsquo;s autoresearch</a>, to find out. After over 65 autonomous iterations, it cut the matrix to 303 lines (28% smaller) while maintaining 98.1% output quality.</p>
<p>Here&rsquo;s the prompt optimization pattern, the results, and what surprised me about how robust LLMs are to reference data reduction.</p>
<h2 id="the-autoresearch-pattern">The Autoresearch Pattern</h2>
<p>Andrej Karpathy&rsquo;s <a href="https://github.com/karpathy/autoresearch">autoresearch</a> introduced the core idea: give an AI agent a metric to optimize and let it loop. Modify, measure, keep or revert, repeat.</p>
<figure class="tweet-screenshot"><a href="https://x.com/karpathy/status/2030371219518931079"><img src="/images/posts/autoresearch/karpathy-tweet.png"
    alt="Andrej Karpathy announcing autoresearch on X"></a>
</figure>

<p>Udit Goenka built a <a href="https://github.com/uditgoenka/autoresearch">Claude Code skill</a> that brings this pattern to arbitrary optimization tasks, adding a dedicated guard command to prevent regressions.</p>
<p>You define six parameters:</p>
<ul>
<li><strong>Goal</strong>: what you want to improve</li>
<li><strong>Scope</strong>: which files the agent can modify</li>
<li><strong>Metric</strong>: a number extracted from a shell command (line count, test coverage, score)</li>
<li><strong>Direction</strong>: whether higher or lower is better</li>
<li><strong>Verify</strong>: the command that produces the metric</li>
<li><strong>Guard</strong>: a safety net command that must always pass</li>
</ul>
<p>Each iteration follows the same cycle: modify, commit to git, verify the metric, run the guard, keep or revert. Every experiment gets committed before verification, so rollbacks are clean. It tracks results in a TSV log and reads its own git history to avoid repeating failed approaches.</p>
<p>The separation between metric and guard is what makes this work. The metric tells autoresearch &ldquo;did we make progress?&rdquo; while the guard tells it &ldquo;did we break anything?&rdquo; Keeping those independent lets the loop optimize aggressively while the guard catches regressions.</p>
<h2 id="setting-up-the-prompt-optimization-experiment">Setting Up the Prompt Optimization Experiment</h2>
<h3 id="scope">Scope</h3>
<p>I scoped autoresearch to a single file — the reference matrix itself. I didn&rsquo;t want it touching the agent&rsquo;s prompt instructions, the recommendation library, or the eval infrastructure. Just the example data.</p>
<p>The alternative was to also let it modify the agent prompt, changing how the matrix is described. But I wanted to isolate the variable: same prompt instructions, same recommendation library, just less example data.</p>
<h3 id="metric">Metric</h3>
<p>For the metric, I used line count. It&rsquo;s simple, deterministic, and directly measures what we care about — how much data gets injected into the prompt. The metric doesn&rsquo;t measure quality at all. That&rsquo;s the guard&rsquo;s job.</p>
<h3 id="guard">Guard</h3>
<p>The quality gate was our existing golden-benchmark eval:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#f5e0dc">EVAL_QUIET</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#89dceb">true</span> npx vitest run --config vitest.evals.config.ts <span style="color:#89b4fa">\
</span></span></span><span style="display:flex;"><span><span style="color:#89b4fa"></span>  src/evals/matrix-generation/golden-benchmark.test.ts
</span></span></code></pre></div><p>This eval feeds all 166 input categories into the agent, runs the full matrix generation end-to-end via LLM, and compares the output against golden reference data across four dimensions:</p>
<ol>
<li><strong>Input category coverage</strong> (did it produce rows for every input category?)</li>
<li><strong>Output category accuracy</strong> (correct category assignment?)</li>
<li><strong>Recommendation overlap</strong> (right recommendations from the library?)</li>
<li><strong>Assignment accuracy</strong> (correct responsible party?)</li>
</ol>
<p>The guard must exit 0 (all Vitest assertions pass) for a change to be kept. Each guard run took 5 to 7 minutes because it makes a real LLM API call to generate the full matrix.</p>
<h2 id="the-65-iteration-run">The 65-Iteration Run</h2>
<p>I ran three rounds:</p>
<table>
  <thead>
      <tr>
          <th>Round</th>
          <th>Iterations</th>
          <th>Lines</th>
          <th>Focus</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>1</td>
          <td>25</td>
          <td>421 → 337</td>
          <td>Easy wins: exact duplicates, severity level consolidation, frequency variants</td>
      </tr>
      <tr>
          <td>2</td>
          <td>25</td>
          <td>337 → 255</td>
          <td>Deeper cuts: multi-row category groups, shared high-severity rows</td>
      </tr>
      <tr>
          <td>3</td>
          <td>15</td>
          <td>255 → 197</td>
          <td>Aggressive: most multi-row groups reduced to 1-2 representatives</td>
      </tr>
  </tbody>
</table>
<p>Each iteration took 6 to 8 minutes (mostly the guard eval). Total wall time was roughly 7 to 8 hours across three rounds.</p>
<p>The agent&rsquo;s approach was systematic. In round 1, it found the free wins: 5 exact duplicate rows, frequency variants (7 rows for different weekly frequencies that could collapse to 2), and severity levels where lower severity was always a subset of moderate. In later rounds, it got more aggressive, reducing most multi-row category groups to single representative entries.</p>
<h2 id="results-after-fixing-the-eval-baseline">Results After Fixing the Eval Baseline</h2>
<p>During the run, I discovered that the original eval had a self-referencing bug. Both the agent prompt and the eval&rsquo;s golden comparison data imported from the same <code>REFERENCE_MATRIX_CSV</code> constant. Every time autoresearch shrank the reference matrix, it also shrank what the eval compared against. The eval was proving &ldquo;the model can reproduce a smaller matrix&rdquo; rather than &ldquo;the model handles all real-world input categories correctly.&rdquo;</p>
<p>The fix was straightforward. I split the data into two files:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-typescript" data-lang="typescript"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">// src/data/reference-matrix.ts — injected into the agent prompt (optimized)
</span></span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"></span><span style="color:#cba6f7">export</span> <span style="color:#cba6f7">const</span> REFERENCE_MATRIX_CSV <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#a6e3a1">`...`</span>; <span style="color:#6c7086;font-style:italic">// 303 lines after optimization
</span></span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"></span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">// src/data/golden-reference-matrix.ts — used by eval (immutable)
</span></span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"></span><span style="color:#cba6f7">export</span> <span style="color:#cba6f7">const</span> GOLDEN_REFERENCE_MATRIX_CSV <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#a6e3a1">`...`</span>; <span style="color:#6c7086;font-style:italic">// original 421 lines, never changes
</span></span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-typescript" data-lang="typescript"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">// src/evals/matrix-generation/golden-benchmark.test.ts
</span></span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">// Before: import { REFERENCE_MATRIX_CSV } from &#39;../../data/reference-matrix&#39;;
</span></span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"></span><span style="color:#cba6f7">import</span> { GOLDEN_REFERENCE_MATRIX_CSV } <span style="color:#cba6f7">from</span> <span style="color:#a6e3a1">&#39;../../data/golden-reference-matrix&#39;</span>;
</span></span></code></pre></div><p>With the fixed eval, I binary-searched through the git history to find the optimal size. Because autoresearch commits every experiment, the full optimization history was available to test against the corrected eval.</p>
<!-- audio-skip -->
<table>
  <thead>
      <tr>
          <th>Lines</th>
          <th>Reduction</th>
          <th>Overall Score</th>
          <th>Status</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>421</td>
          <td>0%</td>
          <td>~99.9%</td>
          <td>Baseline</td>
      </tr>
      <tr>
          <td>337</td>
          <td>-20%</td>
          <td>99.1%</td>
          <td>Above 98%</td>
      </tr>
      <tr>
          <td>308</td>
          <td>-27%</td>
          <td>98.5%</td>
          <td>Above 98%</td>
      </tr>
      <tr>
          <td>305</td>
          <td>-28%</td>
          <td>98.4%</td>
          <td>Above 98%</td>
      </tr>
      <tr>
          <td><strong>303</strong></td>
          <td><strong>-28%</strong></td>
          <td><strong>98.1%</strong></td>
          <td><strong>Sweet spot</strong></td>
      </tr>
      <tr>
          <td>297</td>
          <td>-29%</td>
          <td>97.7%</td>
          <td>Below 98%</td>
      </tr>
      <tr>
          <td>283</td>
          <td>-33%</td>
          <td>96.9%</td>
          <td>Below 98%</td>
      </tr>
      <tr>
          <td>255</td>
          <td>-39%</td>
          <td>~96%</td>
          <td>Below 98%</td>
      </tr>
      <tr>
          <td>197</td>
          <td>-53%</td>
          <td>95.5%</td>
          <td>Too aggressive</td>
      </tr>
  </tbody>
</table>
<p>The sweet spot is 303 lines: a 28% reduction maintaining 98%+ overall quality. The quality cliff appears around iteration 35, where the agent removed shared high-severity rows that contained unique recommendation mappings.</p>
<p>At 303 lines, the score breakdown:</p>
<ul>
<li><strong>Input category coverage:</strong> 100% (all 166 golden categories present)</li>
<li><strong>Output category accuracy:</strong> 100%</li>
<li><strong>Recommendation overlap:</strong> 90.5% (about 32 specific recommendations lost)</li>
<li><strong>Assignment accuracy:</strong> 99.7%</li>
<li><strong>Overall weighted score:</strong> 98.1%</li>
</ul>
<p>The main quality cost is recommendation overlap. The model still covers all input categories and assigns output categories correctly, but produces slightly fewer recommendation rows per category. For this use case, that&rsquo;s an acceptable tradeoff: 118 fewer lines in every prompt for a 1.9% quality reduction.</p>
<h2 id="what-this-reveals-about-llms-and-reference-data">What This Reveals About LLMs and Reference Data</h2>
<p>The most useful finding isn&rsquo;t the 28% number. It&rsquo;s the degradation curve.</p>
<p>Even at 197 lines (53% cut), the model still hit 95.5%. It correctly covered all input categories and most output categories. The recommendation library (a separate 337-entry file in the prompt) carries much of the mapping knowledge. The reference matrix turned out to be more &ldquo;example gallery&rdquo; than &ldquo;source of truth.&rdquo; The model uses it to learn output patterns, not to look up specific mappings.</p>
<p>This has implications for any system that injects large reference data into prompts. The model likely doesn&rsquo;t need all of it. But you need a correct eval to find the actual boundary, and the degradation is gradual, not a cliff. Without a quality gate, you won&rsquo;t know where that boundary is until users report problems.</p>
<h2 id="lessons-for-running-autonomous-optimization-loops">Lessons for Running Autonomous Optimization Loops</h2>
<h3 id="the-guard-is-what-makes-it-work">The guard is what makes it work</h3>
<p>Without a quality gate, autoresearch is a deletion loop. The guard is the only thing preventing it from removing everything. This sounds obvious until you see how easy it is to write a guard that doesn&rsquo;t actually guard.</p>
<h3 id="separate-your-optimization-target-from-your-eval-baseline">Separate your optimization target from your eval baseline</h3>
<p>If your golden data is the same data you&rsquo;re optimizing, you&rsquo;ll always pass. This is easy to do when the reference data serves dual purpose (prompt injection and eval comparison). Split them from the start. The optimization target is mutable. The eval baseline is immutable.</p>
<h3 id="git-as-memory-enables-post-hoc-analysis">Git-as-memory enables post-hoc analysis</h3>
<p>Autoresearch commits every experiment before verification. This is a form of <a href="/posts/how-ai-agents-remember-things/">agent memory</a> that pays off after the run ends: I was able to binary-search through the history after fixing the eval, finding the exact commit where quality degraded. Without that history, I would have had to re-run the entire optimization from scratch.</p>
<h3 id="guard-speed-determines-iteration-budget">Guard speed determines iteration budget</h3>
<p>Fast guards (line count, type checks, unit tests) enable hundreds of iterations overnight. Slow guards (LLM-based evals, end-to-end tests) limit you to 10 to 15 iterations per hour. Plan your guard complexity based on how many iterations you can afford.</p>
<h2 id="applying-this-pattern-to-other-prompt-components">Applying This Pattern to Other Prompt Components</h2>
<p>The recommendation library (a separate 337-entry reference file also injected into every prompt) is the next candidate for the same treatment. Same loop, same approach, but with the eval separation built in from the start.</p>
<p>The pattern generalizes to any prompt optimization problem: define the metric, build a correct guard, let the agent loop. The constraint is always the guard. A guard that looks correct but measures the wrong thing is worse than no guard at all.</p>
<p>I built a one-page scorecard based on the four layers of agent evaluation — component testing, trajectory visibility, outcome measurement, and production monitoring. It takes two minutes and shows you where your gaps are. <a href="/agent-eval-scorecard/">Get the Agent Eval Scorecard →</a></p>
<p>If you&rsquo;re past the scorecard stage and want hands-on help with eval design or prompt optimization, <a href="/ai-agents/">let&rsquo;s talk</a>.</p>
<h2 id="additional-reading">Additional Reading</h2>
<ul>
<li><a href="https://github.com/uditgoenka/autoresearch">autoresearch Claude Code skill</a> by Udit Goenka</li>
<li><a href="https://github.com/karpathy/autoresearch">autoresearch</a> by Andrej Karpathy</li>
</ul>
]]></content:encoded></item><item><title>Four Dimensions of Agent-Ready Codebase Design</title><link>https://www.damiangalarza.com/posts/2026-03-25-four-patterns-that-separate-agent-ready-codebases/</link><pubDate>Wed, 25 Mar 2026 00:00:00 -0400</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/posts/2026-03-25-four-patterns-that-separate-agent-ready-codebases/</guid><description>AI agents produce better output when the codebase is ready for them. Here are the four dimensions of codebase readiness that account for most of the gap.</description><content:encoded><![CDATA[<p>When an AI agent rewrites a file and the result doesn&rsquo;t match your conventions, the first move is usually to adjust the prompt. Try different instructions. Add more context to the message. Maybe switch models.</p>
<p>The model is rarely the bottleneck. The codebase is.</p>
<p>The same model, pointed at a codebase with strong tests, clear architecture, and good documentation, produces remarkably consistent output. Point it at a codebase with weak coverage, no architecture docs, and no linting, and you get drift. Not because the model is less capable, but because it has less to work with.</p>
<p>I built the <a href="/codebase-readiness/">Codebase Readiness Assessment</a> to make this measurable. It scores your repo across eight dimensions on a 0-100 scale. But you don&rsquo;t need to run the assessment to understand what separates high-scoring codebases from low-scoring ones. Four dimensions account for most of the gap.</p>
<h2 id="test-foundation">Test Foundation</h2>
<p>Test foundation carries the most weight in the assessment (25%) because it&rsquo;s the single biggest lever for agent output quality.</p>
<h3 id="what-a-low-score-looks-like">What a low score looks like</h3>
<p>An agent makes a change. There are no tests covering that area, so it moves on. The change compiles, maybe even runs, but it broke an assumption three modules away. Nobody finds out until a human reviews the PR, or worse, until production.</p>
<p>I&rsquo;ve seen this repeatedly: teams with 30-40% test coverage ask an agent to refactor a service object. The agent produces clean code that looks right. But there&rsquo;s no spec for the edge case where a nil association triggers a downstream error. The agent had no way to catch it because there&rsquo;s no test to fail.</p>
<p>The other failure mode is slow tests. If your suite takes 20 minutes, the agent can&rsquo;t iterate. It makes a change, waits, discovers the failure, tries again, waits again. In a fast suite, that feedback cycle takes seconds. In a slow one, the agent burns time and money waiting for results.</p>
<h3 id="what-a-high-score-looks-like">What a high score looks like</h3>
<p>Codebases that score well here share a few characteristics:</p>
<ul>
<li><strong>Coverage above 70% on critical paths.</strong> Not 100% everywhere, but thorough coverage on the code that matters: domain logic, service objects, API endpoints. The agent can make changes and get immediate confirmation that nothing broke.</li>
<li><strong>Suite runs in under 5 minutes.</strong> Fast enough that the agent can run tests after every meaningful change, not just at the end.</li>
<li><strong>Deterministic results.</strong> No flaky tests. When the suite says green, it means green. Agents can&rsquo;t distinguish between a flaky failure and a real one, so flaky tests teach agents to ignore failures.</li>
</ul>
<h3 id="dont-stop-at-unit-tests">Don&rsquo;t stop at unit tests</h3>
<p>Unit tests on service objects and models are the foundation, but they only verify isolated behavior. An agent that passes all unit tests can still break a user-facing workflow that spans multiple components.</p>
<p>End-to-end tests give agents confidence across entire flows. A system spec that signs a user in, submits a form, and checks the result tells the agent whether the <em>feature</em> works, not just whether a method returns the right value. This is especially valuable when agents make changes that touch controllers, views, and services in the same PR.</p>
<p>Here&rsquo;s a simplified system spec from one of my Rails projects. It covers the core user journey: signing in and submitting a video idea for validation.</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-ruby" data-lang="ruby"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># spec/system/idea_submission_spec.rb</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f9e2af">RSpec</span><span style="color:#89dceb;font-weight:bold">.</span>describe <span style="color:#a6e3a1">&#34;Idea submission&#34;</span> <span style="color:#cba6f7">do</span>
</span></span><span style="display:flex;"><span>  it <span style="color:#a6e3a1">&#34;allows a signed-in user to submit a video idea&#34;</span> <span style="color:#cba6f7">do</span>
</span></span><span style="display:flex;"><span>    user <span style="color:#89dceb;font-weight:bold">=</span> create(<span style="color:#a6e3a1">:user</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    sign_in_as(user, <span style="color:#a6e3a1">path</span>: new_idea_path)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#89dceb">select</span> user<span style="color:#89dceb;font-weight:bold">.</span>channels<span style="color:#89dceb;font-weight:bold">.</span>first<span style="color:#89dceb;font-weight:bold">.</span>name, <span style="color:#a6e3a1">from</span>: <span style="color:#a6e3a1">&#34;Channel&#34;</span>
</span></span><span style="display:flex;"><span>    fill_in <span style="color:#a6e3a1">&#34;Title&#34;</span>, <span style="color:#a6e3a1">with</span>: <span style="color:#a6e3a1">&#34;Building a Rails AI Agent from Scratch&#34;</span>
</span></span><span style="display:flex;"><span>    fill_in <span style="color:#a6e3a1">&#34;Description&#34;</span>, <span style="color:#a6e3a1">with</span>: <span style="color:#a6e3a1">&#34;Step-by-step tutorial on building an AI agent&#34;</span>
</span></span><span style="display:flex;"><span>    fill_in <span style="color:#a6e3a1">&#34;Category&#34;</span>, <span style="color:#a6e3a1">with</span>: <span style="color:#a6e3a1">&#34;AI Coding&#34;</span>
</span></span><span style="display:flex;"><span>    click_button <span style="color:#a6e3a1">&#34;Validate Idea&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    expect(page)<span style="color:#89dceb;font-weight:bold">.</span>to have_content(<span style="color:#a6e3a1">&#34;Building a Rails AI Agent from Scratch&#34;</span>)
</span></span><span style="display:flex;"><span>  <span style="color:#cba6f7">end</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">end</span>
</span></span></code></pre></div><p>This test touches authentication, the form UI, the controller, the background job, and the results page. If an agent breaks any part of that chain, this spec catches it.</p>
<p>The tradeoff is speed. End-to-end tests are slower and more brittle than unit tests. You don&rsquo;t need full E2E coverage, but having system specs on your critical user journeys (signup, checkout, the core action your product is built around) gives agents a safety net that unit tests alone can&rsquo;t provide.</p>
<h3 id="the-smallest-change-that-moves-the-needle">The smallest change that moves the needle</h3>
<p>Add coverage to your critical paths first. Don&rsquo;t chase a coverage number. Instead, identify the three or four service objects or domain models where bugs would hurt the most, and write specs for those. Then add one or two system specs covering your most important user journeys end-to-end. If your suite is slow, add parallel test execution. In a Rails app, that might be as simple as adding the <code>parallel_tests</code> gem. A suite that goes from 15 minutes to 4 minutes fundamentally changes how an agent can work with your code. If you&rsquo;re running multiple agents in parallel, you&rsquo;ll also need <a href="/posts/2026-03-10-extending-claude-code-worktrees-for-true-database-isolation/">database isolation per worktree</a> to prevent test data collisions.</p>
<p>If you want to accelerate the process, tools like <a href="https://github.com/uditgoenka/autoresearch">autoresearch</a> apply this pattern as an autonomous loop: give the agent a measurable goal (like a coverage target), and it iterates, verifies, keeps what works, and discards what doesn&rsquo;t.</p>
<h2 id="documentation-as-code">Documentation as Code</h2>
<p>Documentation carries 15% of the assessment weight, but in practice it&rsquo;s the dimension where I see the biggest gap between teams that get good agent output and teams that don&rsquo;t.</p>
<h3 id="what-a-low-score-looks-like-1">What a low score looks like</h3>
<p>Without an agent-facing entry point (a <code>CLAUDE.md</code>, <code>AGENTS.md</code>, or equivalent), an agent has to reverse-engineer your conventions from the code itself. It reads your files, infers patterns, and guesses at intent. Sometimes it guesses right. Often it doesn&rsquo;t.</p>
<p>Here&rsquo;s a concrete example. A Rails app uses service objects for all business logic. Controllers call a service, the service does the work, and the result gets rendered. There&rsquo;s nothing enforcing this in the framework. It&rsquo;s a team convention. An agent that doesn&rsquo;t know about this convention puts the logic directly in the controller action. The code works. The tests pass. But it breaks the team&rsquo;s pattern, and now there&rsquo;s a 50-line controller action that should have been a service object.</p>
<p>The agent wasn&rsquo;t wrong. It had no way to know.</p>
<h3 id="what-a-high-score-looks-like-1">What a high score looks like</h3>
<p>The key insight is that this entry point file should be a map, not a manual. OpenAI&rsquo;s Harness Engineering team <a href="https://openai.com/index/harness-engineering/">learned this the hard way</a>: they tried a single large instruction file and it failed because &ldquo;context is a scarce resource&rdquo; and &ldquo;too much guidance becomes non-guidance.&rdquo; When everything is marked important, agents pattern-match locally instead of navigating intentionally.</p>
<p>Their solution: keep the entry file short (roughly 100 lines) and treat it as a table of contents that points to deeper sources of truth in a structured <code>docs/</code> directory. The entry file gives agents quick commands and a documentation map. The detail lives in dedicated files the agent reads when it needs them. Whether you call it <code>CLAUDE.md</code>, <code>AGENTS.md</code>, or <code>CURSOR.md</code>, the pattern is the same.</p>
<p>Here&rsquo;s what this looks like in practice from one of my Rails projects:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-markdown" data-lang="markdown"><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold">## Quick Commands
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span>
</span></span><span style="display:flex;"><span>bin/dev                                # Start dev server
</span></span><span style="display:flex;"><span>bin/rails spec                         # All tests
</span></span><span style="display:flex;"><span>bin/ci                                 # Full CI: lint + security + tests
</span></span><span style="display:flex;"><span>bin/rubocop                            # Lint
</span></span><span style="display:flex;"><span>bin/brakeman                           # Security scan
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold">## Documentation Map
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span>
</span></span><span style="display:flex;"><span>| Topic | Document |
</span></span><span style="display:flex;"><span>|-------|----------|
</span></span><span style="display:flex;"><span>| Stack, patterns, domain model | docs/ARCHITECTURE.md |
</span></span><span style="display:flex;"><span>| Testing patterns and stack | docs/TESTING.md |
</span></span><span style="display:flex;"><span>| Credentials, env vars, API keys | docs/CONFIGURATION.md |
</span></span><span style="display:flex;"><span>| Engineering principles | docs/design-docs/core-beliefs.md |
</span></span><span style="display:flex;"><span>| Architecture decision records | docs/design-docs/ |
</span></span></code></pre></div><p>The agent gets commands and a map up front. When it needs to understand the domain model or testing conventions, it follows the pointer. This is progressive disclosure: the agent starts with what it needs immediately and loads deeper context on demand.</p>
<p>Here&rsquo;s a trimmed excerpt from the <code>ARCHITECTURE.md</code> behind that pointer:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-markdown" data-lang="markdown"><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold">## Domain Model
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span>
</span></span><span style="display:flex;"><span>CreatorSignal validates YouTube video ideas. The core flow:
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">1.</span> User submits a video <span style="font-weight:bold">**Idea**</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">2.</span> A <span style="font-weight:bold">**Validation**</span> job is enqueued
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">3.</span> The <span style="font-weight:bold">**ResearchAgent**</span> runs tools against YouTube, Reddit, X, and HN
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">4.</span> Results are synthesized into a scored <span style="font-weight:bold">**Go / Refine / Kill**</span> verdict
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold">### Key Models
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span>
</span></span><span style="display:flex;"><span>| Model | Responsibility |
</span></span><span style="display:flex;"><span>|-------|---------------|
</span></span><span style="display:flex;"><span>| <span style="color:#a6e3a1">`User`</span> | Authentication, subscription plan |
</span></span><span style="display:flex;"><span>| <span style="color:#a6e3a1">`Idea`</span> | A video idea submitted for validation |
</span></span><span style="display:flex;"><span>| <span style="color:#a6e3a1">`Validation`</span> | One run of the research agent against an idea |
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold">### Project Structure
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span>
</span></span><span style="display:flex;"><span>app/
</span></span><span style="display:flex;"><span>├── components/       # ViewComponent components
</span></span><span style="display:flex;"><span>├── controllers/
</span></span><span style="display:flex;"><span>├── jobs/             # ActiveJob jobs (async validation)
</span></span><span style="display:flex;"><span>├── models/
</span></span><span style="display:flex;"><span>├── services/         # Research agent, tool orchestration
</span></span><span style="display:flex;"><span>└── views/            # Hotwire (Turbo frames/streams)
</span></span></code></pre></div><p>An agent reading this knows what an <code>Idea</code> is, that validation is async through a job, and that orchestration logic lives in <code>app/services/</code>. Those are the conventions that prevent drift.</p>
<p>ADRs (Architecture Decision Records) add a layer that documentation alone can&rsquo;t. An agent that understands <em>why</em> a particular pattern was chosen can make better decisions when extending it. If your ADR says &ldquo;we chose event sourcing for the billing domain because of auditability requirements,&rdquo; the agent won&rsquo;t try to refactor billing into simple CRUD.</p>
<h3 id="the-smallest-change-that-moves-the-needle-1">The smallest change that moves the needle</h3>
<p>Create an <code>AGENTS.md</code> in your project root with two things: commands (build, test, lint) and a documentation map pointing to deeper files. <a href="https://agents.md/"><code>AGENTS.md</code></a> is an emerging standard supported by Codex, Cursor, Gemini CLI, GitHub Copilot, Windsurf, Devin, and <a href="https://agents.md/">many others</a>. If you&rsquo;re using Claude Code, symlink <code>CLAUDE.md</code> to it so both resolve to the same file. Then create an <code>ARCHITECTURE.md</code> covering your stack, domain model, and key conventions. This can take an hour and the effect on agent output is immediate. If you want to automate the scaffolding, the <a href="https://github.com/dgalarza/claude-code-workflows">agent-ready plugin</a> generates a starting point based on your existing codebase.</p>
<h2 id="architecture-clarity">Architecture Clarity</h2>
<p>Architecture clarity carries 15% of the assessment weight. It measures whether an agent can understand where code belongs and how components relate to each other.</p>
<h3 id="what-a-low-score-looks-like-2">What a low score looks like</h3>
<p>Agents replicate patterns they find in the codebase. If your codebase has clear boundaries (controllers handle HTTP, services handle business logic, models handle persistence), the agent follows those boundaries. If your codebase mixes concerns, the agent mixes concerns.</p>
<p>The most common failure I see: a controller that does everything. It validates input, calls the database, sends emails, enqueues jobs. An agent asked to add a new feature looks at the existing controller, sees that&rsquo;s where logic goes, and adds more logic to the controller. The agent is doing exactly what the codebase taught it to do.</p>
<p>The subtler version is dependency direction. In a well-layered app, dependencies point inward: controllers depend on services, services depend on models. When that direction is inconsistent (models importing from controllers, services reaching into HTTP request objects), agents produce code with the same tangled dependencies.</p>
<h3 id="what-a-high-score-looks-like-2">What a high score looks like</h3>
<ul>
<li><strong>Clear layering.</strong> Each layer has a single responsibility, and the codebase is consistent about which layer owns what.</li>
<li><strong>Domain namespacing.</strong> Related functionality is grouped by business domain, not just by technical layer. Instead of a flat <code>app/services/</code> with 40 files, you have <code>app/services/billing/</code>, <code>app/services/onboarding/</code>, <code>app/services/research/</code>. When an agent needs to add billing logic, the namespace tells it exactly where to look and what patterns to follow.</li>
<li><strong>Predictable file organization.</strong> A new developer (or agent) can guess where a piece of code lives based on what it does.</li>
<li><strong>Dependency direction is consistent.</strong> Inner layers don&rsquo;t reach outward. You don&rsquo;t see models importing controller concerns.</li>
</ul>
<p>Domain namespacing is especially powerful for agents because it constrains the search space. An agent working on a billing feature only needs to understand the billing namespace, not the entire codebase. It finds the existing patterns in that namespace and replicates them. Without namespacing, the agent has to scan the whole codebase to figure out where billing logic lives, and it might find three different patterns in three different places.</p>
<h3 id="the-smallest-change-that-moves-the-needle-2">The smallest change that moves the needle</h3>
<p>If you have fat controllers, extract one. Pick your most complex controller action, pull the business logic into a service object, and write a spec for it. The agent will start using that service object pattern for new features. One well-structured example teaches the agent more than any documentation, because it&rsquo;s a pattern it can directly replicate.</p>
<p>If your codebase has grown past a handful of services, start namespacing by domain. Group related services, jobs, and models under a shared namespace. This compounds quickly: once you have three or four service objects under <code>Billing::</code>, agents start producing new billing code in the same namespace by default. The codebase becomes self-reinforcing.</p>
<h2 id="feedback-loops">Feedback Loops</h2>
<p>Feedback loops carry 10% of the assessment weight, but their impact is multiplicative. Good feedback loops make everything else work better. Poor ones make everything else work worse.</p>
<h3 id="what-a-low-score-looks-like-3">What a low score looks like</h3>
<p>Agents learn from the signals they get back. When the only signal is &ldquo;tests passed,&rdquo; the agent has no way to know it introduced a style violation, broke a naming convention, or used a deprecated API. It moves on, confident the change is correct.</p>
<p>Two things make feedback loops weak: <strong>narrow signals</strong> and <strong>slow signals</strong>.</p>
<p>Narrow signals mean the agent only hears from one source. Tests tell the agent whether the code works. They don&rsquo;t tell it whether the code follows your conventions, whether it introduced a security vulnerability, or whether the UI actually renders correctly. Each missing signal is a category of problems the agent can&rsquo;t self-correct.</p>
<p>Slow signals are just as damaging. If the agent has to wait 20 minutes for a CI run to discover a linting error, it&rsquo;s already moved on. It&rsquo;s built three more features on top of code that doesn&rsquo;t pass lint. Now you&rsquo;re unwinding multiple changes instead of catching the first one. The closer the feedback is to the moment of the change, the cheaper it is to fix.</p>
<p>There&rsquo;s also a hierarchy to how you enforce conventions. Anything that can be checked deterministically by a linter should be a lint rule, not a line in your <code>CLAUDE.md</code>. A lint rule catches every violation, every time. A documentation rule depends on the agent reading it and choosing to follow it. If your convention is &ldquo;methods must be under 20 lines&rdquo; or &ldquo;always use <code>frozen_string_literal</code>,&rdquo; encode it in RuboCop, ESLint, or whatever linter your stack uses. Save documentation for the things that can&rsquo;t be mechanically enforced: architectural decisions, domain context, workflow conventions.</p>
<h3 id="what-a-high-score-looks-like-3">What a high score looks like</h3>
<ul>
<li><strong>Pre-commit hooks for immediate feedback.</strong> The agent discovers formatting issues, type errors, or lint violations before it even commits.</li>
<li><strong>CI that runs in under 10 minutes.</strong> Fast enough that the agent can push, get feedback, and iterate without burning excessive context.</li>
<li><strong>Rich error messages.</strong> Linting output that says &ldquo;method too long (25 lines, max 20)&rdquo; is actionable. A generic &ldquo;style violation&rdquo; is not.</li>
</ul>
<p>Here&rsquo;s what a CI script looks like when it goes beyond just running tests. This is the <code>bin/ci</code> from the same Rails project:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-ruby" data-lang="ruby"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># config/ci.rb - run with bin/ci</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f9e2af">CI</span><span style="color:#89dceb;font-weight:bold">.</span>run <span style="color:#cba6f7">do</span>
</span></span><span style="display:flex;"><span>  step <span style="color:#a6e3a1">&#34;Setup&#34;</span>, <span style="color:#a6e3a1">&#34;bin/setup --skip-server&#34;</span>
</span></span><span style="display:flex;"><span>  step <span style="color:#a6e3a1">&#34;Style: Ruby&#34;</span>, <span style="color:#a6e3a1">&#34;bin/rubocop&#34;</span>
</span></span><span style="display:flex;"><span>  step <span style="color:#a6e3a1">&#34;Security: Gem audit&#34;</span>, <span style="color:#a6e3a1">&#34;bin/bundler-audit&#34;</span>
</span></span><span style="display:flex;"><span>  step <span style="color:#a6e3a1">&#34;Security: Importmap vulnerability audit&#34;</span>, <span style="color:#a6e3a1">&#34;bin/importmap audit&#34;</span>
</span></span><span style="display:flex;"><span>  step <span style="color:#a6e3a1">&#34;Security: Brakeman code analysis&#34;</span>, <span style="color:#a6e3a1">&#34;bin/brakeman --quiet --no-pager --exit-on-warn --exit-on-error&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">end</span>
</span></span></code></pre></div><p>Five steps, each giving the agent a different kind of feedback. RuboCop catches style violations. Bundler-audit catches vulnerable gems. Brakeman catches security issues in the code itself. An agent that runs <code>bin/ci</code> gets five signals instead of one.</p>
<h3 id="browser-access-as-a-feedback-loop">Browser access as a feedback loop</h3>
<p>For web applications, there&rsquo;s a feedback loop that most teams overlook: giving agents the ability to see what they built.</p>
<p>An agent that can only run tests is working blind on anything visual. It can verify that a controller returns 200, but it can&rsquo;t tell whether the page actually renders correctly, whether a modal opens, or whether a form submits without errors. Cursor&rsquo;s team <a href="https://cursor.com/blog/agent-computer-use">wrote about this</a>: once they gave agents browser access via cloud sandboxes, agents could &ldquo;iterate until they&rsquo;ve validated their output rather than handing off the first attempt.&rdquo; More than 30% of their merged PRs are now created by agents operating autonomously in cloud sandboxes.</p>
<p>You don&rsquo;t need a full cloud sandbox to get value from this. Claude Code has <a href="https://code.claude.com/docs/en/chrome">built-in Chrome support</a> via <code>claude --chrome</code>, and tools like Playwright MCP give agents browser control locally. The agent can navigate to a page, take a snapshot of the DOM, fill in a form, and verify the result. That&rsquo;s a feedback loop that catches an entire class of issues that unit tests and linters never will.</p>
<h3 id="the-smallest-change-that-moves-the-needle-3">The smallest change that moves the needle</h3>
<p>Add a linter to your CI pipeline. For a Ruby project, that&rsquo;s RuboCop. For JavaScript/TypeScript, ESLint. For Python, Ruff. One config file, one CI step. The agent immediately starts getting feedback on style and conventions that it wouldn&rsquo;t otherwise know about.</p>
<p>If you want faster feedback, add pre-commit hooks. The agent runs into the linter before it even pushes, which means it fixes issues in the same context window where it created them. That&rsquo;s cheaper, faster, and produces cleaner commits.</p>
<p>For web projects, consider adding browser access through Playwright MCP or a similar tool. The agent starts verifying its own UI changes instead of relying on you to catch visual issues in review.</p>
<h2 id="where-to-start">Where to Start</h2>
<p>If you&rsquo;re looking at your codebase and wondering where to start, here&rsquo;s how I think about prioritization:</p>
<ol>
<li><strong>Fix your test foundation first.</strong> Without reliable tests, every other improvement is hard to verify. An agent can&rsquo;t confidently refactor your architecture if there&rsquo;s no test suite to catch regressions.</li>
<li><strong>Add an AGENTS.md.</strong> This is 30 minutes of work that immediately changes agent behavior. It&rsquo;s the highest-ROI improvement you can make.</li>
<li><strong>Add a linter to CI.</strong> This closes the feedback gap with minimal effort. The agent starts learning your conventions from automated feedback instead of guessing from code patterns.</li>
</ol>
<p>These three changes don&rsquo;t require a major initiative. They&rsquo;re individual tasks that compound. A codebase with strong tests, clear documentation, and fast feedback loops creates a reinforcing cycle: agents produce better code, which maintains the patterns, which makes future agent output even better.</p>
<p>If you want to see where your codebase stands across all eight dimensions, run the <a href="/codebase-readiness/">Codebase Readiness Assessment</a>. It takes 60 seconds and gives you a score, a per-dimension breakdown, and a prioritized roadmap.</p>
<p>If your team wants hands-on help closing these gaps, that&rsquo;s what a <a href="/services/#retainer">Production AI Retainer</a> is built for. Or if you just want to talk through your results, <a href="/pages/meet/">book a free intro call</a>.</p>
<h2 id="further-reading">Further Reading</h2>
<ul>
<li><a href="/codebase-readiness/">Codebase Readiness Assessment</a> - Run the free assessment on your repo</li>
<li><a href="https://openai.com/index/harness-engineering/">Harness Engineering: Leveraging Codex in an Agent-First World</a> - OpenAI&rsquo;s deep dive on building a million-line codebase entirely with agents</li>
<li><a href="https://cursor.com/blog/agent-computer-use">Agent Computer Use</a> - How Cursor gives agents browser access to verify their own work</li>
<li><a href="/posts/2025-11-25-how-i-use-claude-code/">How I Use Claude Code: My Complete Development Workflow</a> - How codebase structure impacts agent output quality</li>
<li><a href="/posts/2026-02-05-mcps-vs-agent-skills/">MCPs vs Agent Skills</a> - Architecture decisions that shape how agents interact with your codebase</li>
</ul>
]]></content:encoded></item><item><title>Automate Everything With Claude Code Channels and Webhooks</title><link>https://www.damiangalarza.com/videos/2026-03-23-automate-everything-with-claude-code-channels-and-webhooks/</link><pubDate>Mon, 23 Mar 2026 14:00:06 +0000</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/videos/2026-03-23-automate-everything-with-claude-code-channels-and-webhooks/</guid><description>Claude Code Channels let you connect your coding sessions to Telegram, Discord, webhooks, or any custom UI — here's how they work.</description><content:encoded><![CDATA[<p>Claude Code Channels let you drive your coding sessions from Telegram, Discord, webhooks, or a custom UI — here&rsquo;s the full breakdown.</p>
<p>I walk through how channels work under the hood, including the Fake Chat plugin for testing, and a practical example building a webhook channel that processes Cal.com bookings via an MCP server bridge. If you want to automate Claude Code or expose it to external services, this covers everything from installation to writing your own custom channel.</p>
]]></content:encoded></item><item><title>What Claude Code Does in Your Terminal</title><link>https://www.damiangalarza.com/posts/2026-03-16-what-claude-code-does-in-your-terminal/</link><pubDate>Mon, 16 Mar 2026 00:00:00 -0400</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/posts/2026-03-16-what-claude-code-does-in-your-terminal/</guid><description>Claude Code runs terminal commands and asks you to approve them. This explains what those commands mean and when to pause before saying yes.</description><content:encoded><![CDATA[<p>Every time you use Claude Code to work on a project, it runs commands in your terminal. You&rsquo;ve probably seen the prompt: &ldquo;I&rsquo;m going to run this command. Approve?&rdquo; If you&rsquo;ve come across commands like these and wondered what they actually mean, you&rsquo;re in the right place.</p>
<p>This guide is for people who are building real things with Claude Code but haven&rsquo;t spent much time in the terminal. Designers, product managers, founders who have picked up coding recently. If the terminal feels like reading another language, this is the starting point.</p>
<h2 id="what-the-terminal-actually-is">What the Terminal Actually Is</h2>
<p>Your computer has two ways to interact with it.</p>
<p>The first is what you&rsquo;re used to: icons, windows, buttons. You click a file to open it. You drag it to the trash to delete it. The computer shows you what&rsquo;s happening visually.</p>
<p>The second is the terminal. Instead of clicking, you type. Instead of visual feedback, you get text. It&rsquo;s the same computer doing the same things, just through a different interface.</p>
<p>The terminal can do everything the graphical interface (sometimes called a <strong>GUI</strong>, or graphical user interface) can do, and usually faster. When Claude Code searches your entire codebase for a function name, it does that in the terminal because searching thousands of files through a visual interface would be painfully slow.</p>
<p>The terminal comes pre-installed on Mac (it&rsquo;s called Terminal). Tools like VS Code and Cursor have one built in. When you see Claude Code working, it&rsquo;s using that built-in terminal.</p>
<h2 id="files-folders-and-where-you-are">Files, Folders, and Where You Are</h2>
<p>Everything on your computer is either a file or a folder. Files have content (code, images, documents). Folders contain files and other folders.</p>
<p>The terminal uses slightly different words. Folders are called <strong>directories</strong>. Your project is a directory (usually containing many sub-directories). Same concept, different name.</p>
<p>The terminal always has a <strong>current location</strong>. Think of it like Finder or Windows Explorer: you&rsquo;re always looking at a specific folder. In the terminal, that location is called the <strong>working directory</strong>.</p>
<p>When Claude Code opens in your project, the working directory is your project folder. Every command it runs starts from there.</p>
<p>File paths can be <strong>relative</strong> or <strong>absolute</strong>. An absolute path starts from the root of your computer, like <code>/Users/yourname/Projects/my-app/src/App.tsx</code>. A relative path starts from wherever you currently are. If your working directory is <code>/Users/yourname/Projects/my-app</code>, then <code>src/App.tsx</code> points to the same file. Most commands Claude Code runs use relative paths, so knowing your current location matters.</p>
<p>If you&rsquo;re ever unsure which directory you&rsquo;re in, you can check by running:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#89dceb">pwd</span>
</span></span></code></pre></div><p>This prints the full path. Something like <code>/Users/yourname/Projects/my-app</code>. Read-only, nothing changes.</p>
<p>To move to a different directory, use <code>cd</code> (change directory):</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#89dceb">cd</span> ~/Projects/my-app
</span></span></code></pre></div><p>The <code>~/</code> is a shortcut that means your home folder (usually <code>/Users/yourname</code> on Mac or <code>/home/yourname</code> on Linux). So <code>~/Projects/my-app</code> expands to the full path automatically.</p>
<p>You can also move up one level with <code>cd ../</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#89dceb">cd</span> ../
</span></span></code></pre></div><p>The <code>../</code> means &ldquo;the parent directory,&rdquo; or one folder up from where you currently are. If you&rsquo;re in <code>/Users/yourname/Projects/my-app/src</code>, running <code>cd ../</code> takes you back to <code>/Users/yourname/Projects/my-app</code>.</p>
<p>This is how you navigate to your project before starting Claude Code. If Claude Code isn&rsquo;t finding the files you expect, it might be running from the wrong directory. Use <code>pwd</code> to check, and <code>cd</code> to move to the right place.</p>
<h2 id="looking-around-what-claude-code-does-first">Looking Around: What Claude Code Does First</h2>
<p>Before Claude Code makes any changes, it looks around. It needs to understand what exists before it can modify anything. The commands it uses for this are all read-only — they cannot break anything.</p>
<p><strong><code>ls</code></strong> lists the files and folders in the current directory:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>ls
</span></span><span style="display:flex;"><span>ls src/components
</span></span></code></pre></div><p>Claude Code might run <code>ls</code> before creating a file to confirm it doesn&rsquo;t already exist. Or look inside a specific folder to understand what&rsquo;s there.</p>
<p><strong><code>cat</code></strong> prints the contents of a file:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>cat package.json
</span></span></code></pre></div><p>When Claude Code reads your <code>package.json</code> to understand your project&rsquo;s dependencies, this is what it&rsquo;s doing. Reading, not changing.</p>
<p>One thing to keep in mind: <code>cat</code> is read-only, but be cautious if Claude Code tries to <code>cat</code> files that contain secrets, like <code>.env</code> files with API keys or passwords. Those contents will be visible in your terminal session and sent to Claude as context. Treat secret files the same way you&rsquo;d treat a password: don&rsquo;t share them unless you understand where they&rsquo;re going.</p>
<p><strong><code>grep</code></strong> searches for text inside files:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>grep -r <span style="color:#a6e3a1">&#34;handleSubmit&#34;</span> src/
</span></span></code></pre></div><p>The <code>-r</code> here is a <strong>flag</strong>. Flags modify how a command behaves. They usually start with a dash (<code>-</code>) followed by a letter. In this case, <code>-r</code> tells <code>grep</code> to search <strong>recursively</strong>, meaning it looks through every file in the <code>src/</code> folder and all its sub-folders. Without <code>-r</code>, it would only search a single file.</p>
<p>You&rsquo;ll see flags on many commands. <code>ls -l</code> shows files in a detailed list. <code>rm -r</code> deletes directories. Each flag changes the command&rsquo;s behavior in a specific way. When you see one you don&rsquo;t recognize, that&rsquo;s a good time to ask Claude what it does.</p>
<p>This <code>grep</code> command finds every file in <code>src/</code> that contains the text <code>handleSubmit</code>. Claude Code uses this constantly to find where things are defined and where they&rsquo;re used.</p>
<p>All of these commands are exploratory. Nothing gets created, moved, or deleted. If you&rsquo;re unsure whether to approve a command, <code>ls</code>, <code>cat</code>, and <code>grep</code> are safe.</p>
<h2 id="when-claude-code-actually-changes-things">When Claude Code Actually Changes Things</h2>
<p>Reading is safe. Everything else deserves attention.</p>
<p><strong>Creating files and directories:</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>mkdir src/components/forms
</span></span><span style="display:flex;"><span>touch src/components/forms/LoginForm.tsx
</span></span></code></pre></div><p><code>mkdir</code> creates a directory. <code>touch</code> creates an empty file. Low-risk, but worth understanding what&rsquo;s being created and where.</p>
<p><strong>Moving and renaming:</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>mv LoginForm.tsx AuthForm.tsx
</span></span></code></pre></div><p><code>mv</code> moves or renames a file. After this runs, <code>LoginForm.tsx</code> no longer exists under that name. Renaming in the terminal is just a move with a new name.</p>
<p><strong>Deleting:</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>rm old-config.ts
</span></span><span style="display:flex;"><span>rm -r deprecated/
</span></span></code></pre></div><p><code>rm</code> deletes files. <code>rm -r</code> deletes entire directories and everything inside them. There is no trash bin. No undo. The file is gone.</p>
<p>Claude Code shouldn&rsquo;t be deleting things without telling you what and why. If you see <code>rm -rf</code>, verify that you actually want those files removed before approving. If you are unsure, ask Claude what it&rsquo;s deleting and why.</p>
<p><strong>Running your project&rsquo;s tools:</strong></p>
<p>Depending on your project, you&rsquo;ll see Claude Code use different <strong>package managers</strong> and tools. A package manager handles your project&rsquo;s dependencies: the external libraries and code your project relies on so you don&rsquo;t have to build everything from scratch.</p>
<p>The most common ones are <strong>npm</strong> (for JavaScript/Node.js projects), <strong>pip</strong> (for Python), <strong>gem</strong> (for Ruby), and <strong>yarn</strong> (an alternative to npm). Claude Code will use whichever one your project is set up with. When Claude Code runs <code>npm install</code> or <code>pip install</code>, it&rsquo;s downloading packages. <code>npm run build</code> tells npm to run a build script defined in your project.</p>
<p><strong>git</strong> is version control. It tracks every change to your code over time, like a detailed save history. <code>git add</code> tells git which files to include in your next save. <code>git commit</code> takes those staged files and saves them as a snapshot in the project&rsquo;s history.</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>npm install
</span></span><span style="display:flex;"><span>npm run build
</span></span><span style="display:flex;"><span>git add . <span style="color:#89dceb;font-weight:bold">&amp;&amp;</span> git commit -m <span style="color:#a6e3a1">&#34;Add login form&#34;</span>
</span></span></code></pre></div><p>These commands do real things. <code>npm install</code> downloads packages to your project. <code>git commit</code> saves your changes to version history. When Claude Code runs these, it&rsquo;s making moves that affect your project&rsquo;s state.</p>
<h2 id="reading-a-command-before-you-approve">Reading a Command Before You Approve</h2>
<p>You might see more complicated commands from Claude Code. Take the following example:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>find . -name <span style="color:#a6e3a1">&#34;*.test.ts&#34;</span> | grep <span style="color:#a6e3a1">&#34;auth&#34;</span>
</span></span></code></pre></div><p>These are actually sequences of simpler commands combined together, a powerful feature of the terminal that lets you compose tools. To understand it, break it down from left to right:</p>
<ul>
<li><code>find .</code> — search starting from the current directory (<code>.</code> means &ldquo;here&rdquo;)</li>
<li><code>-name &quot;*.test.ts&quot;</code> — find files that end in <code>.test.ts</code></li>
<li><code>|</code> — take the results and feed them to the next command</li>
<li><code>grep &quot;auth&quot;</code> — filter those results to only lines containing &ldquo;auth&rdquo;</li>
</ul>
<p>The full command: &ldquo;Find all test files and show me the ones related to auth.&rdquo; That&rsquo;s read-only. Safe to approve.</p>
<p>The <code>|</code> character is called a <strong>pipe</strong>. It chains commands together: the output of the first becomes the input of the second. You&rsquo;ll see it often. When you do, read each side separately, then understand what connects them.</p>
<h2 id="commands-to-think-twice-about">Commands to Think Twice About</h2>
<p>Most of what Claude Code runs is safe. A few patterns are worth pausing on.</p>
<p><strong><code>rm -rf</code> on anything important:</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>rm -rf src/
</span></span></code></pre></div><p>This deletes the entire <code>src</code> directory instantly. Claude Code will ask you to approve this command, but once you do, there&rsquo;s no additional safety net from the operating system. No trash bin, no undo. Be sure about this before approving.</p>
<p><strong><code>sudo</code>:</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>sudo npm install -g some-package
</span></span></code></pre></div><p><code>sudo</code> runs a command as an administrator. It will prompt you for your computer&rsquo;s password. Claude Code rarely needs this. If you see it, ask Claude why it&rsquo;s necessary. And don&rsquo;t type your password into Claude Code&rsquo;s prompt. If <code>sudo</code> is truly needed, deny the command, then run it yourself directly in your own terminal after Claude explains what it does.</p>
<p><strong><code>curl</code> piped into bash:</strong></p>
<p><code>curl</code> is a command that downloads content from the internet. On its own, it&rsquo;s harmless. It just fetches a file or a web page. But when you see it piped into <code>bash</code>, it becomes something different:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>curl https://example.com/script.sh | bash
</span></span></code></pre></div><p>This downloads a script from the internet and runs it immediately. The <code>|</code> sends whatever <code>curl</code> fetches straight into <code>bash</code>, which executes it as commands on your computer. Uncommon, but when you see it, understand exactly what you&rsquo;re downloading before approving.</p>
<p><strong>Pro tip:</strong> If you&rsquo;re not sure what a script contains or whether it&rsquo;s safe, ask Claude to review it first. Deny the <code>curl | bash</code> command and say something like: &ldquo;What is this script you are trying to run in bash? Before we run this, can you download the contents and evaluate if it&rsquo;s safe or not? Help me understand what it does.&rdquo; Claude can fetch the file, display its contents, and walk you through what each part does so you can make an informed decision.</p>
<h2 id="when-a-command-fails">When a Command Fails</h2>
<p>Claude Code commands fail regularly. That&rsquo;s normal. Here&rsquo;s what the common errors mean:</p>
<ul>
<li><strong><code>command not found</code></strong> — the tool isn&rsquo;t installed. Claude Code will usually try to install it, or tell you what you need.</li>
<li><strong><code>No such file or directory</code></strong> — the path is wrong. Something about the file name or location doesn&rsquo;t match.</li>
<li><strong><code>Permission denied</code></strong> — you don&rsquo;t have access to that file or directory.</li>
</ul>
<p>When something fails, Claude Code will diagnose it and try a different approach. If it gets stuck, copy the error message and paste it back. Errors in the terminal are almost always actionable once you know what they&rsquo;re saying.</p>
<h2 id="getting-help-with-commands">Getting Help with Commands</h2>
<p>You don&rsquo;t need to memorize what every command does. There are two quick ways to get answers.</p>
<p><strong>Ask the command itself.</strong> Most terminal commands have a built-in help option. Add <code>--help</code> after the command name:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>grep --help
</span></span><span style="display:flex;"><span>git --help
</span></span><span style="display:flex;"><span>npm --help
</span></span></code></pre></div><p>This prints a summary of what the command does and what flags it accepts. The output can be dense, but scanning it for the flag you&rsquo;re curious about usually answers your question.</p>
<p><strong>Ask Claude.</strong> If a command Claude Code wants to run looks unfamiliar, deny it and ask Claude to explain. Something as simple as &ldquo;What does this command do?&rdquo; will get you a plain-language breakdown. Claude can explain the command, each flag, and what will happen when it runs. This is one of the most practical ways to learn the terminal while using Claude Code.</p>
<h2 id="youre-supervising-not-executing">You&rsquo;re Supervising, Not Executing</h2>
<p>You don&rsquo;t need to be a terminal expert to use Claude Code. Your job is to understand enough to supervise what it&rsquo;s doing.</p>
<p>Supervisors know enough to recognize when something looks right and when something needs a question. That&rsquo;s the skill worth developing here.</p>
<p>Most of what Claude Code runs is safe. The unsafe things are usually obvious once you know what <code>rm -rf</code> means and when <code>sudo</code> is suspicious. You now know both.</p>
<h2 id="cheat-sheet--glossary">Cheat Sheet &amp; Glossary</h2>
<p>I put together a 2-page reference card with every command and term from this post, including safety ratings for each one. Keep it next to your keyboard while you work, or print it out.</p>
<p><a href="/downloads/terminal-cheat-sheet/">Get the free Terminal Cheat Sheet (PDF) →</a></p>
<p>If you&rsquo;re new to the terminal and want to use Claude Code confidently, I offer coaching to help you get there. From project setup to understanding what&rsquo;s happening under the hood, we can work through it together. <a href="/coaching/#claude-code-basics">Book a Claude Code Basics session</a>.</p>
]]></content:encoded></item><item><title>Ruby LSP Now Has Official Claude Code Support</title><link>https://www.damiangalarza.com/posts/2026-03-13-ruby-lsp-claude-code/</link><pubDate>Fri, 13 Mar 2026 00:00:00 -0400</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/posts/2026-03-13-ruby-lsp-claude-code/</guid><description>Ruby LSP now has official Claude Code support. Install the plugin to give Claude go to definition, find references, and call hierarchy across your Ruby project.</description><content:encoded><![CDATA[<p>Up until now, Claude Code has understood your Ruby codebase the same way grep does. It reads files, searches for patterns, and infers structure from text. That works surprisingly well for many tasks. But it has a ceiling.</p>
<p>Ask Claude to explain a Rails model and it reads through the file, piecing together what the associations mean, inferring what the enum values map to, and hoping it catches all the scopes. It&rsquo;s reading text and making educated guesses about structure. Ask it to find all callers of a private method before you delete it and it&rsquo;s doing a best-effort text scan that can miss dynamic calls or alias references.</p>
<p>That ceiling just got raised. Ruby LSP now has official support in Claude Code.</p>
<h2 id="what-is-a-language-server-protocol">What Is a Language Server Protocol</h2>
<p>LSP is a standard that separates language intelligence from the editor. Before LSP, every editor had to implement its own understanding of every language. After LSP, you write the intelligence once as a language server, and any editor that speaks the protocol gets the benefits.</p>
<p>The language server sits alongside your code and maintains a live semantic model of your project. It tracks definitions, references, and symbol relationships across your entire project. When your editor asks &ldquo;where is this method defined?&rdquo; or &ldquo;what are all the callers of this function?&rdquo;, the server answers in milliseconds with accurate, complete information.</p>
<p><a href="https://github.com/Shopify/ruby-lsp">Ruby LSP</a> is Shopify&rsquo;s language server for Ruby. It integrates with Standard Ruby for diagnostics. In codebases that use <a href="https://sorbet.org/">Sorbet</a> or <a href="https://github.com/soutaro/steep">Steep</a>, those tools run as separate language servers and provide type information independently of Ruby LSP. Beyond the core server, there&rsquo;s a growing community of add-on developers building extensions for it. <a href="https://github.com/johansenja/awesome-ruby-lsp">awesome-ruby-lsp</a> is a good starting point if you want to see what&rsquo;s available.</p>
<h2 id="what-you-get">What You Get</h2>
<p>With Ruby LSP connected, Claude gains semantic code intelligence instead of relying on text search. Here&rsquo;s what that means in practice.</p>
<p><strong>Go to definition.</strong> Claude can jump directly to where a method, class, or constant is defined, including inside gems. No more manually tracking down gem source paths.</p>
<p><strong>Find all references.</strong> Before a refactor, Claude can ask the language server for every reference to a symbol across your entire project. Dynamic calls included, because the LSP understands Ruby&rsquo;s metaprogramming patterns well enough to handle common cases.</p>
<p><strong>Call hierarchy.</strong> Claude can trace the full call chain for a method, both incoming callers and outgoing calls, across your entire project. That&rsquo;s useful when you need to understand the impact of a change before making it.</p>
<p><strong>Hover documentation.</strong> Claude can pull the YARD documentation for any method without having to search for it manually.</p>
<p>The difference shows up most in refactoring. When you ask Claude to rename a method or change a signature, it can now ask the LSP for the complete reference list before making a single change. That&rsquo;s the difference between a confident refactor and a hopeful one.</p>
<h2 id="installing-ruby-lsp-in-claude-code">Installing Ruby LSP in Claude Code</h2>
<p>You&rsquo;ll need Ruby LSP installed as a gem first:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>gem install ruby-lsp
</span></span></code></pre></div><p>If you&rsquo;re working in a bundled project, add it to your <code>Gemfile</code> in the development group:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-ruby" data-lang="ruby"><span style="display:flex;"><span>group <span style="color:#a6e3a1">:development</span> <span style="color:#cba6f7">do</span>
</span></span><span style="display:flex;"><span>  gem <span style="color:#a6e3a1">&#34;ruby-lsp&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">end</span>
</span></span></code></pre></div><p>Then install the plugin from inside Claude Code:</p>
<pre tabindex="0"><code>/plugin install ruby-lsp@claude-plugins-official
</code></pre><p>That&rsquo;s it. The next time you start a Claude Code session in a Ruby project, the LSP server starts alongside it. You don&rsquo;t need to configure anything. Install it once and it works.</p>
<h2 id="what-changes-in-practice">What Changes in Practice</h2>
<p>Here&rsquo;s what it looks like in a real session. I asked Claude about a model in a Rails app:</p>
<pre tabindex="0"><code>❯ Use ruby-lsp and tell me about the membership model

⏺ Searched for 1 pattern (ctrl+o to expand)

⏺ LSP(operation: &#34;documentSymbol&#34;, file: &#34;app/models/membership.rb&#34;)
  ⎿  Found 6 symbols (ctrl+o to expand)

⏺ Read 1 file (ctrl+o to expand)

⏺ Now let me get hover info on a few key symbols:

⏺ LSP(operation: &#34;hover&#34;, symbol: &#34;enum&#34;, in: &#34;app/models/membership.rb&#34;)
  ⎿  Hover info available (ctrl+o to expand)

⏺ LSP(operation: &#34;findReferences&#34;, symbol: &#34;Membership&#34;, in: &#34;app/models/membership.rb&#34;)
  ⎿  Found 24 references across 12 files (ctrl+o to expand)

⏺ Here&#39;s a summary of the Membership model:

  app/models/membership.rb

  A join model representing a user&#39;s membership in a tenant (organization).

  Key traits:
  - acts_as_tenant :tenant — multi-tenant scoped via ActsAsTenant
  - belongs_to :tenant, belongs_to :user
  - Roles (integer enum): member (1), admin (2), owner (3)
  - scope :owners — filters members with the owner role
  - Validations: user_id must be unique per tenant (one membership per user per org), role must be present

  Used in 12 files across:
  - Controllers: invitations, memberships, organization settings
  - Services: AcceptInvitation, ConfirmUserEmail
  - Specs: model spec, tenant isolation spec, system specs

  The model is minimal — it&#39;s purely a pivot between User and Tenant with role-based access control baked in.
</code></pre><p>That&rsquo;s a thorough summary. But I was curious what actually came from the LSP versus what Claude found by reading the file. So I asked:</p>
<pre tabindex="0"><code>❯ What information did you get from the lsp vs your own search

⏺ Good question. Here&#39;s the breakdown:

  From LSP:
  - documentSymbol — gave me the class name, method names, and line numbers without reading the file
    (associations, scope, validations)
  - hover on enum — confirmed it&#39;s ActiveRecord::Enum#enum with a link to the gem source definition
  - findReferences — found all 24 references across 12 files without me doing any grep search

  From my own file read (Read tool):
  - The actual source code content — the enum values (member: 1, admin: 2, owner: 3), the acts_as_tenant
    call, the uniqueness scope on the validation (scope: :tenant_id)

  In practice:
  - LSP gave me structure and cross-references (what symbols exist, where they&#39;re used project-wide)
  - The file read gave me content (the actual values and implementation details)

  The documentSymbol result notably omitted the acts_as_tenant call and the enum values — it only surfaced
  Rails macro calls it recognized as meaningful symbols. So LSP alone would have left me with an incomplete
  picture of the model.
</code></pre><p>That second turn is the honest answer. LSP and file reads are complementary. The LSP gives Claude structure and cross-references instantly: what symbols exist, where they&rsquo;re called across the project, without spending context window tokens on file searches. The file read fills in the content: actual values, implementation details, anything the symbol index doesn&rsquo;t capture. Together they produce a more complete picture than either alone.</p>
<p>That kind of structural understanding also changes how refactoring works. When you ask Claude to rename a class or change how something is called across the project, it queries the LSP for the complete reference list before making a single change. It isn&rsquo;t guessing at usage patterns. It knows them.</p>
<p>One thing worth knowing: Ruby LSP works well without a type system. In my testing without Sorbet or Steep, the heuristic inference still produced accurate results for associations, method definitions, and references. If your codebase does use Sorbet or Steep, those type checkers can run alongside the LSP to improve accuracy further, but they&rsquo;re not required to get value from this integration.</p>
<p>Text search got Claude surprisingly far. Semantic understanding gets it the rest of the way.</p>
<p>If you want help getting Claude Code dialed in for your Ruby workflow, whether that&rsquo;s 1:1 or across your team, <a href="/claude-code/">take a look at how I work with developers and teams</a>.</p>
<h2 id="further-reading">Further Reading</h2>
<ul>
<li><a href="https://github.com/Shopify/ruby-lsp">Ruby LSP on GitHub</a></li>
<li><a href="https://shopify.github.io/ruby-lsp/">Ruby LSP documentation</a></li>
<li><a href="https://github.com/johansenja/awesome-ruby-lsp">awesome-ruby-lsp</a></li>
<li><a href="https://microsoft.github.io/language-server-protocol/">Language Server Protocol specification</a></li>
<li><a href="https://code.claude.com/docs/en/discover-plugins">Claude Code plugins documentation</a></li>
</ul>
]]></content:encoded></item><item><title>Extending Claude Code Worktrees for True Database Isolation</title><link>https://www.damiangalarza.com/posts/2026-03-10-extending-claude-code-worktrees-for-true-database-isolation/</link><pubDate>Tue, 10 Mar 2026 00:00:00 -0400</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/posts/2026-03-10-extending-claude-code-worktrees-for-true-database-isolation/</guid><description>Claude Code's native worktree support handles file isolation, but Rails apps need database isolation too. Here's how to extend it with the WorktreeCreate hook.</description><content:encoded><![CDATA[<p>If you&rsquo;re running multiple Claude Code sessions on a Rails app, you need isolation. Without it, agents edit the same files, collide on the same branch, and corrupt each other&rsquo;s state.</p>
<p>Git worktrees solve the file side of this. Each agent gets its own working directory with its own branch. Anthropic recently shipped native worktree support in Claude Code, which handles the Git mechanics automatically. No external scripts to maintain.</p>
<p>But file isolation is only half the problem. A Rails app running in a worktree still points at the same development and test databases as your main checkout. Two agents running specs in parallel will insert conflicting test data, and you&rsquo;ll get flaky tests with no obvious cause.</p>
<p>Native worktrees handle file isolation. The <code>WorktreeCreate</code> hook handles everything else.</p>
<h2 id="what-native-worktree-support-does">What Native Worktree Support Does</h2>
<p>Claude Code now manages worktrees directly. Start a session with the <code>--worktree</code> flag:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>claude --worktree my-feature-branch
</span></span></code></pre></div><p>This creates a new branch and checks it out in a dedicated directory under <code>.claude/worktrees/</code>. The branch name argument is optional. Skip it and Claude generates a random name.</p>
<p>The <code>--worktree</code> flag isn&rsquo;t the only entry point. You can also:</p>
<ul>
<li>Ask Claude mid-session to work in a worktree</li>
<li>Spawn sub-agents that each get their own worktree automatically</li>
<li>Set <code>isolation: &quot;worktree&quot;</code> on custom agent definitions so they always run isolated</li>
</ul>
<p>That last option is worth calling out. If you have an agent that runs specs or does anything destructive, setting isolation at the agent level means you never have to remember the flag. It&rsquo;s just how that agent works.</p>
<p>One thing to note: <code>.claude/worktrees/</code> is not added to <code>.gitignore</code> by default. Add it early so worktree directories don&rsquo;t end up in your commit history.</p>
<h2 id="the-worktreeinclude-file">The <code>.worktreeinclude</code> File</h2>
<p>Git worktrees only duplicate tracked files. Anything in your <code>.gitignore</code> won&rsquo;t carry over. For most Rails apps, that means your <code>.env</code> file, <code>config/master.key</code>, and any other credentials are missing in every new worktree.</p>
<p>Claude Code solves this with <code>.worktreeinclude</code>. It works like a reverse <code>.gitignore</code>. You list the gitignored files that should be copied into each new worktree:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-text" data-lang="text"><span style="display:flex;"><span># .worktreeinclude
</span></span><span style="display:flex;"><span>.env
</span></span><span style="display:flex;"><span>config/master.key
</span></span><span style="display:flex;"><span>config/credentials/development.key
</span></span></code></pre></div><p>When Claude creates a worktree, it reads this file and copies each listed file from the main checkout into the new directory. Your agents stop failing silently because they&rsquo;re running without credentials.</p>
<h2 id="where-native-worktrees-fall-short">Where Native Worktrees Fall Short</h2>
<p>File isolation is necessary, but it&rsquo;s not sufficient. Consider this scenario: two agents running your test suite at the same time. Both point at the same test database. Agent A inserts test data. Agent B&rsquo;s assertions fail because the data doesn&rsquo;t match what it expected.</p>
<p>You get flaky tests, data collisions, and corrupted state. The root cause is invisible because each agent&rsquo;s test run passes when run alone.</p>
<p>True isolation for a Rails app requires three things:</p>
<ol>
<li><strong>A separate database per worktree</strong> so test runs don&rsquo;t collide</li>
<li><strong>Environment config</strong> (credentials, <code>.env</code> files) available in each worktree</li>
<li><strong>Dependencies installed</strong> so the app can actually boot</li>
</ol>
<p>The <code>.worktreeinclude</code> file handles environment config. Native worktrees handle the Git mechanics. But nothing handles the database. That&rsquo;s the gap.</p>
<h2 id="the-worktreecreate-hook">The <code>WorktreeCreate</code> Hook</h2>
<p>Claude Code supports lifecycle hooks that run at specific moments during a session. The one we care about is <code>WorktreeCreate</code>. It fires every time a worktree is created, whether from the CLI flag, a mid-session request, or a sub-agent spawn.</p>
<p>This hook was originally designed for teams using version control systems other than Git, like SVN or Mercurial. It&rsquo;s an extension point that lets you replace the default Git worktree behavior entirely with your own setup logic.</p>
<p>We can use that same extension point to bootstrap a full Rails environment.</p>
<h3 id="hook-configuration">Hook Configuration</h3>
<p>Add the hook to your Claude Code settings at <code>.claude/settings.json</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#cba6f7">&#34;hooks&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">&#34;WorktreeCreate&#34;</span>: [
</span></span><span style="display:flex;"><span>      {
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;command&#34;</span>: <span style="color:#a6e3a1">&#34;.claude/hooks/worktree-create.sh&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;timeout&#34;</span>: <span style="color:#fab387">60000</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>The timeout is set to 60 seconds. <code>bin/setup</code> can take a while, especially if it&rsquo;s running database migrations.</p>
<p>When the hook fires, Claude passes JSON to stdin with the session context, including the working directory and session ID. Your script reads that input, does the setup, and prints the path to the created worktree to stdout. That&rsquo;s the contract: print the path, exit zero, and Claude uses that directory.</p>
<h3 id="the-setup-script">The Setup Script</h3>
<p>Here&rsquo;s the full script. I&rsquo;ll walk through each section below.</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#!/usr/bin/env bash
</span></span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"></span><span style="color:#89dceb">set</span> -euo pipefail
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># WorktreeCreate hook: creates a git worktree, symlinks shared files,</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># configures worktree-specific databases, and runs bin/setup.</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Input (JSON on stdin): { &#34;name&#34;: &#34;&lt;slug&gt;&#34;, &#34;cwd&#34;: &#34;&lt;project-root&gt;&#34;, ... }</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Output (stdout): absolute path to the created worktree directory</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># All other output goes to stderr so it doesn&#39;t interfere with the path.</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f5e0dc">INPUT</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#cba6f7">$(</span>cat<span style="color:#cba6f7">)</span>
</span></span><span style="display:flex;"><span><span style="color:#f5e0dc">NAME</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#cba6f7">$(</span><span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$INPUT</span><span style="color:#a6e3a1">&#34;</span> | jq -r <span style="color:#a6e3a1">&#39;.name&#39;</span><span style="color:#cba6f7">)</span>
</span></span><span style="display:flex;"><span><span style="color:#f5e0dc">PROJECT_DIR</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#cba6f7">$(</span><span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$INPUT</span><span style="color:#a6e3a1">&#34;</span> | jq -r <span style="color:#a6e3a1">&#39;.cwd&#39;</span><span style="color:#cba6f7">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f5e0dc">WORKTREE_DIR</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$PROJECT_DIR</span><span style="color:#a6e3a1">/.claude/worktrees/</span><span style="color:#f5e0dc">$NAME</span><span style="color:#a6e3a1">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># -------------------------------------------------------------------</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># 1. Create the git worktree</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># -------------------------------------------------------------------</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">if</span> <span style="color:#89dceb;font-weight:bold">[</span> -d <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$WORKTREE_DIR</span><span style="color:#a6e3a1">&#34;</span> <span style="color:#89dceb;font-weight:bold">]</span>; <span style="color:#cba6f7">then</span>
</span></span><span style="display:flex;"><span>  <span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;Worktree directory already exists: </span><span style="color:#f5e0dc">$WORKTREE_DIR</span><span style="color:#a6e3a1">&#34;</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#89dceb">exit</span> <span style="color:#fab387">1</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;Creating git worktree at </span><span style="color:#f5e0dc">$WORKTREE_DIR</span><span style="color:#a6e3a1"> ...&#34;</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span>git -C <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$PROJECT_DIR</span><span style="color:#a6e3a1">&#34;</span> worktree add -b <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$NAME</span><span style="color:#a6e3a1">&#34;</span> <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$WORKTREE_DIR</span><span style="color:#a6e3a1">&#34;</span> HEAD &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># -------------------------------------------------------------------</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># 2. Symlink entries from .worktreeinclude</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># -------------------------------------------------------------------</span>
</span></span><span style="display:flex;"><span><span style="color:#f5e0dc">INCLUDE_FILE</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$PROJECT_DIR</span><span style="color:#a6e3a1">/.worktreeinclude&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">if</span> <span style="color:#89dceb;font-weight:bold">[</span> -f <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$INCLUDE_FILE</span><span style="color:#a6e3a1">&#34;</span> <span style="color:#89dceb;font-weight:bold">]</span>; <span style="color:#cba6f7">then</span>
</span></span><span style="display:flex;"><span>  <span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;Symlinking entries from .worktreeinclude ...&#34;</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#cba6f7">while</span> <span style="color:#f5e0dc">IFS</span><span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#89dceb">read</span> -r entry <span style="color:#89dceb;font-weight:bold">||</span> <span style="color:#89dceb;font-weight:bold">[</span> -n <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$entry</span><span style="color:#a6e3a1">&#34;</span> <span style="color:#89dceb;font-weight:bold">]</span>; <span style="color:#cba6f7">do</span>
</span></span><span style="display:flex;"><span>    <span style="color:#6c7086;font-style:italic"># Skip blank lines and comments</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f5e0dc">entry</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#cba6f7">$(</span><span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$entry</span><span style="color:#a6e3a1">&#34;</span> | sed <span style="color:#a6e3a1">&#39;s/#.*//&#39;</span> | xargs<span style="color:#cba6f7">)</span>
</span></span><span style="display:flex;"><span>    <span style="color:#89dceb;font-weight:bold">[</span> -z <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$entry</span><span style="color:#a6e3a1">&#34;</span> <span style="color:#89dceb;font-weight:bold">]</span> <span style="color:#89dceb;font-weight:bold">&amp;&amp;</span> <span style="color:#cba6f7">continue</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#f5e0dc">SOURCE</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$PROJECT_DIR</span><span style="color:#a6e3a1">/</span><span style="color:#f5e0dc">$entry</span><span style="color:#a6e3a1">&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f5e0dc">TARGET</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$WORKTREE_DIR</span><span style="color:#a6e3a1">/</span><span style="color:#f5e0dc">$entry</span><span style="color:#a6e3a1">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">if</span> <span style="color:#89dceb;font-weight:bold">[</span> ! -e <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$SOURCE</span><span style="color:#a6e3a1">&#34;</span> <span style="color:#89dceb;font-weight:bold">]</span>; <span style="color:#cba6f7">then</span>
</span></span><span style="display:flex;"><span>      <span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;  SKIP (not found): </span><span style="color:#f5e0dc">$entry</span><span style="color:#a6e3a1">&#34;</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span>      <span style="color:#cba6f7">continue</span>
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6c7086;font-style:italic"># Ensure parent directory exists in the worktree</span>
</span></span><span style="display:flex;"><span>    mkdir -p <span style="color:#a6e3a1">&#34;</span><span style="color:#cba6f7">$(</span>dirname <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$TARGET</span><span style="color:#a6e3a1">&#34;</span><span style="color:#cba6f7">)</span><span style="color:#a6e3a1">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6c7086;font-style:italic"># Remove the file/dir that git checkout placed there (if any)</span>
</span></span><span style="display:flex;"><span>    rm -rf <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$TARGET</span><span style="color:#a6e3a1">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    ln -s <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$SOURCE</span><span style="color:#a6e3a1">&#34;</span> <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$TARGET</span><span style="color:#a6e3a1">&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;  Linked: </span><span style="color:#f5e0dc">$entry</span><span style="color:#a6e3a1">&#34;</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#cba6f7">done</span> &lt;<span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$INCLUDE_FILE</span><span style="color:#a6e3a1">&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">else</span>
</span></span><span style="display:flex;"><span>  <span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;No .worktreeinclude file found, skipping symlinks.&#34;</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># -------------------------------------------------------------------</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># 3. Configure worktree-specific databases via .env.local</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># -------------------------------------------------------------------</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Sanitize the name for use in database names (replace dashes with underscores)</span>
</span></span><span style="display:flex;"><span><span style="color:#f5e0dc">DB_SLUG</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#cba6f7">$(</span><span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$NAME</span><span style="color:#a6e3a1">&#34;</span> | tr <span style="color:#a6e3a1">&#39;-&#39;</span> <span style="color:#a6e3a1">&#39;_&#39;</span><span style="color:#cba6f7">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f5e0dc">ENV_LOCAL</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$WORKTREE_DIR</span><span style="color:#a6e3a1">/.env.local&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;Writing worktree-specific database config to .env.local ...&#34;</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span>cat &gt;<span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$ENV_LOCAL</span><span style="color:#a6e3a1">&#34;</span> <span style="color:#a6e3a1">&lt;&lt;EOF
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1"># Auto-generated for worktree: $NAME
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">DB_DATABASE=tracewell_development_${DB_SLUG}
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">DB_TEST_DATABASE=tracewell_test_${DB_SLUG}
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">EOF</span>
</span></span><span style="display:flex;"><span><span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;  Development DB: tracewell_development_</span><span style="color:#a6e3a1">${</span><span style="color:#f5e0dc">DB_SLUG</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1">&#34;</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span><span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;  Test DB:        tracewell_test_</span><span style="color:#a6e3a1">${</span><span style="color:#f5e0dc">DB_SLUG</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1">&#34;</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># -------------------------------------------------------------------</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Output: print the worktree path for Claude Code (must happen before</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># any step that could fail so Claude Code can track the session)</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># -------------------------------------------------------------------</span>
</span></span><span style="display:flex;"><span><span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$WORKTREE_DIR</span><span style="color:#a6e3a1">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># -------------------------------------------------------------------</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># 4. Run bin/setup (skip starting the dev server)</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># -------------------------------------------------------------------</span>
</span></span><span style="display:flex;"><span><span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;Running bin/setup --skip-server ...&#34;</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span><span style="color:#89dceb;font-weight:bold">(</span><span style="color:#89dceb">cd</span> <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$WORKTREE_DIR</span><span style="color:#a6e3a1">&#34;</span> <span style="color:#89dceb;font-weight:bold">&amp;&amp;</span> bin/setup --skip-server<span style="color:#89dceb;font-weight:bold">)</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span></code></pre></div><p>The script does four things:</p>
<p><strong>Creates the Git worktree.</strong> This replicates what Claude does natively, giving us a new branch and working directory.</p>
<p><strong>Copies <code>.worktreeinclude</code> files.</strong> We re-implement this behavior since we&rsquo;re replacing the default worktree creation. Each listed file gets copied from the main checkout into the new worktree.</p>
<p><strong>Writes a <code>.env.local</code> with unique database names.</strong> This is the key part. The branch name gets transformed into a Postgres-safe format (hyphens become underscores) and used as a prefix for both the development and test database names. Your <code>database.yml</code> needs to read from these environment variables, falling back to defaults when they&rsquo;re not set:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># config/database.yml</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">development</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#cba6f7">url</span>: &lt;%= ENV.fetch(&#34;DATABASE_URL&#34;, &#34;postgres://localhost/myapp_development&#34;) %&gt;
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">test</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#cba6f7">url</span>: &lt;%= ENV.fetch(&#34;TEST_DATABASE_URL&#34;, &#34;postgres://localhost/myapp_test&#34;) %&gt;
</span></span></code></pre></div><p>When <code>.env.local</code> exists in the worktree, the app picks up the unique database names. In your main checkout, the defaults apply as usual.</p>
<p><strong>Runs <code>bin/setup</code>.</strong> This creates the databases, runs migrations, and installs dependencies. The worktree is fully bootable when the hook finishes.</p>
<p>Each worktree now points at its own development and test database. Two agents can run specs simultaneously without interference.</p>
<h2 id="cleaning-up-with-worktreeremove">Cleaning Up with <code>WorktreeRemove</code></h2>
<p>Worktrees accumulate, and so do the databases they create. Claude Code automatically cleans up the worktree directory when you exit a session with no changes, but the databases stick around.</p>
<p>The <code>WorktreeRemove</code> hook mirrors the creation logic in reverse:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#cba6f7">&#34;hooks&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">&#34;WorktreeCreate&#34;</span>: [
</span></span><span style="display:flex;"><span>      {
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;command&#34;</span>: <span style="color:#a6e3a1">&#34;.claude/hooks/worktree-create.sh&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;timeout&#34;</span>: <span style="color:#fab387">60000</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    ],
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">&#34;WorktreeRemove&#34;</span>: [
</span></span><span style="display:flex;"><span>      {
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;command&#34;</span>: <span style="color:#a6e3a1">&#34;.claude/hooks/worktree-remove.sh&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;timeout&#34;</span>: <span style="color:#fab387">60000</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>    ]
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>The removal script reads the worktree path, derives the database names the same way the creation script did, and drops them:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#!/usr/bin/env bash
</span></span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"></span><span style="color:#89dceb">set</span> -euo pipefail
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># WorktreeRemove hook: drops worktree-specific databases and deletes the</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># associated branch. Claude Code handles git worktree removal automatically.</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Input (JSON on stdin): { &#34;name&#34;: &#34;&lt;slug&gt;&#34;, &#34;cwd&#34;: &#34;&lt;project-root&gt;&#34;, ... }</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># All output goes to stderr.</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f5e0dc">INPUT</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#cba6f7">$(</span>cat<span style="color:#cba6f7">)</span>
</span></span><span style="display:flex;"><span><span style="color:#f5e0dc">NAME</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#cba6f7">$(</span><span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$INPUT</span><span style="color:#a6e3a1">&#34;</span> | jq -r <span style="color:#a6e3a1">&#39;.name&#39;</span><span style="color:#cba6f7">)</span>
</span></span><span style="display:flex;"><span><span style="color:#f5e0dc">PROJECT_DIR</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#cba6f7">$(</span><span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$INPUT</span><span style="color:#a6e3a1">&#34;</span> | jq -r <span style="color:#a6e3a1">&#39;.cwd&#39;</span><span style="color:#cba6f7">)</span>
</span></span><span style="display:flex;"><span><span style="color:#f5e0dc">WORKTREE_DIR</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#cba6f7">$(</span><span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$INPUT</span><span style="color:#a6e3a1">&#34;</span> | jq -r <span style="color:#a6e3a1">&#39;.worktree_path&#39;</span><span style="color:#cba6f7">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">if</span> <span style="color:#89dceb;font-weight:bold">[</span> ! -d <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$WORKTREE_DIR</span><span style="color:#a6e3a1">&#34;</span> <span style="color:#89dceb;font-weight:bold">]</span>; <span style="color:#cba6f7">then</span>
</span></span><span style="display:flex;"><span>  <span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;Worktree directory not found: </span><span style="color:#f5e0dc">$WORKTREE_DIR</span><span style="color:#a6e3a1">&#34;</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span>  <span style="color:#89dceb">exit</span> <span style="color:#fab387">0</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Move to main repo before removing the worktree so the process cwd stays valid</span>
</span></span><span style="display:flex;"><span><span style="color:#89dceb">cd</span> <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$PROJECT_DIR</span><span style="color:#a6e3a1">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># -------------------------------------------------------------------</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># 1. Drop worktree-specific databases via Rails</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># -------------------------------------------------------------------</span>
</span></span><span style="display:flex;"><span><span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;Dropping databases for worktree: </span><span style="color:#f5e0dc">$NAME</span><span style="color:#a6e3a1"> ...&#34;</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span><span style="color:#89dceb;font-weight:bold">(</span><span style="color:#89dceb">cd</span> <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$WORKTREE_DIR</span><span style="color:#a6e3a1">&#34;</span> <span style="color:#89dceb;font-weight:bold">&amp;&amp;</span> <span style="color:#f5e0dc">RAILS_ENV</span><span style="color:#89dceb;font-weight:bold">=</span>development bin/rails db:drop <span style="color:#f5e0dc">DISABLE_DATABASE_ENVIRONMENT_CHECK</span><span style="color:#89dceb;font-weight:bold">=</span>1<span style="color:#89dceb;font-weight:bold">)</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span><span style="color:#89dceb;font-weight:bold">(</span><span style="color:#89dceb">cd</span> <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$WORKTREE_DIR</span><span style="color:#a6e3a1">&#34;</span> <span style="color:#89dceb;font-weight:bold">&amp;&amp;</span> <span style="color:#f5e0dc">RAILS_ENV</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#89dceb">test</span> bin/rails db:drop <span style="color:#f5e0dc">DISABLE_DATABASE_ENVIRONMENT_CHECK</span><span style="color:#89dceb;font-weight:bold">=</span>1<span style="color:#89dceb;font-weight:bold">)</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># -------------------------------------------------------------------</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># 2. Deregister the git worktree (required before branch can be deleted)</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># -------------------------------------------------------------------</span>
</span></span><span style="display:flex;"><span><span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;Deregistering git worktree at </span><span style="color:#f5e0dc">$WORKTREE_DIR</span><span style="color:#a6e3a1"> ...&#34;</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span>git -C <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$PROJECT_DIR</span><span style="color:#a6e3a1">&#34;</span> worktree remove --force <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$WORKTREE_DIR</span><span style="color:#a6e3a1">&#34;</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># -------------------------------------------------------------------</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># 3. Delete the branch</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># -------------------------------------------------------------------</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">if</span> git -C <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$PROJECT_DIR</span><span style="color:#a6e3a1">&#34;</span> rev-parse --verify <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$NAME</span><span style="color:#a6e3a1">&#34;</span> &gt;/dev/null 2&gt;&amp;1; <span style="color:#cba6f7">then</span>
</span></span><span style="display:flex;"><span>  <span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;Deleting branch: </span><span style="color:#f5e0dc">$NAME</span><span style="color:#a6e3a1"> ...&#34;</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span>  git -C <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$PROJECT_DIR</span><span style="color:#a6e3a1">&#34;</span> branch -D <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$NAME</span><span style="color:#a6e3a1">&#34;</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">else</span>
</span></span><span style="display:flex;"><span>  <span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;Branch not found, skipping: </span><span style="color:#f5e0dc">$NAME</span><span style="color:#a6e3a1">&#34;</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">fi</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;Worktree &#39;</span><span style="color:#f5e0dc">$NAME</span><span style="color:#a6e3a1">&#39; removed successfully.&#34;</span> &gt;&amp;<span style="color:#fab387">2</span>
</span></span></code></pre></div><p>Alternatively, you could skip the hook and run a periodic cleanup script that finds orphaned databases and drops them. The hook approach is cleaner if you want everything automated.</p>
<h2 id="beyond-rails">Beyond Rails</h2>
<p>These patterns aren&rsquo;t Rails-specific. Rails conventions make the setup straightforward, but any application with external state has the same gap. Django, Phoenix, Laravel, if your app talks to a database, a cache, or a message queue, Git worktrees alone won&rsquo;t isolate those resources.</p>
<p>The worktree gives you file isolation. The hook gives you everything else.</p>
<p>The mental model is simple: the <code>WorktreeCreate</code> hook is your opportunity to bootstrap whatever environment your application needs, and the <code>WorktreeRemove</code> hook is your opportunity to tear it down. What you put in those scripts depends entirely on your stack.</p>
<hr>
<p>I covered the full setup process in a video walkthrough. If you want to see the hook in action, including the database creation and a parallel test run, <a href="https://youtu.be/hEE0mc-3D_c">watch it here</a>.</p>
<p>If you&rsquo;re looking to set up isolated agent workflows for your team or need help integrating Claude Code into your development process, <a href="https://www.damiangalarza.com/claude-code">let&rsquo;s talk</a>.</p>
]]></content:encoded></item><item><title>Building a Linear-Driven Agent Loop with Claude Code</title><link>https://www.damiangalarza.com/posts/2026-02-13-linear-agent-loop/</link><pubDate>Fri, 13 Feb 2026 00:00:00 -0500</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/posts/2026-02-13-linear-agent-loop/</guid><description>How I built a bash-based agent loop that pulls work from Linear, implements features, runs code review, and opens pull requests autonomously.</description><content:encoded><![CDATA[<p>In December, the developer community on X was buzzing about Ralph Wiggum. If you missed it, Anthropic&rsquo;s Claude Code plugins had a plugin called <a href="https://github.com/anthropics/claude-code/tree/main/plugins/ralph-wiggum">Ralph Wiggum</a>. In the README it&rsquo;s described as:</p>
<blockquote>
<p>Ralph is a development methodology based on continuous AI agent loops. As Geoffrey Huntley describes it: &ldquo;Ralph is a Bash loop&rdquo; - a simple while true that repeatedly feeds an AI agent a prompt file, allowing it to iteratively improve its work until completion.</p></blockquote>
<p>This was used in a variety of ways. Two common ones were:</p>
<ol>
<li>Unleash an agent to work on a single task on its own until it was done.</li>
<li>Unleash an agent to iterate through a backlog of work until it had completed all of it.</li>
</ol>
<p>Today we&rsquo;re going to explore the second one, using an agent loop to iterate through a project backlog.</p>
<h2 id="where-ralph-wiggum-falls-flat">Where Ralph Wiggum Falls Flat</h2>
<p>The Ralph Wiggum plugin provides a command you call inside Claude Code. The session continues until a set of requirements have been met, at which point the loop exits. For example:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>/ralph-loop <span style="color:#a6e3a1">&#34;Build a REST API for todos. Requirements: CRUD operations, input validation, bin/rails test and bin/rails lint must pass. Output &lt;promise&gt;COMPLETE&lt;/promise&gt; when done.&#34;</span>
</span></span></code></pre></div><p>There is a drawback to this approach though. Running the loop inside of a Claude Code session means we&rsquo;re eating away at our context window. If you&rsquo;ve read my blog post on <a href="/posts/2025-12-08-understanding-claude-code-context-window">Understanding Claude Code&rsquo;s Context Window</a> then you know that this can cause poor results as time goes on. This becomes exponentially worse if you are trying to loop through multiple pieces of work. The agent&rsquo;s context window will be subject to context rot as different streams of work are worked on.</p>
<p>There is a solution though.</p>
<h2 id="bash-loops">Bash Loops</h2>
<p>Instead of running a Ralph Wiggum loop inside of the Claude Code instance, we can loop inside bash. In this version every iteration of the loop starts with a fresh context window, avoiding issues with context rot. This works via the <code>--dangerously-skip-permissions</code> flag, which allows Claude Code to run non-interactively without prompting for tool approvals. An example loop looks something like:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#cba6f7">while</span> true; <span style="color:#cba6f7">do</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f5e0dc">SESSION</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#cba6f7">$((</span>SESSION <span style="color:#89dceb;font-weight:bold">+</span> <span style="color:#fab387">1</span><span style="color:#cba6f7">))</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f5e0dc">TIMESTAMP</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#cba6f7">$(</span>date +%Y%m%d_%H%M%S<span style="color:#cba6f7">)</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f5e0dc">COMMIT</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#cba6f7">$(</span>git rev-parse --short<span style="color:#89dceb;font-weight:bold">=</span><span style="color:#fab387">6</span> HEAD 2&gt;/dev/null <span style="color:#89dceb;font-weight:bold">||</span> <span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;no-git&#34;</span><span style="color:#cba6f7">)</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f5e0dc">LOGFILE</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#a6e3a1">&#34;</span><span style="color:#a6e3a1">${</span><span style="color:#f5e0dc">LOG_DIR</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1">/</span><span style="color:#a6e3a1">${</span><span style="color:#f5e0dc">AGENT_NAME</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1">_</span><span style="color:#a6e3a1">${</span><span style="color:#f5e0dc">TIMESTAMP</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1">_</span><span style="color:#a6e3a1">${</span><span style="color:#f5e0dc">COMMIT</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1">.log&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;--- Session #</span><span style="color:#a6e3a1">${</span><span style="color:#f5e0dc">SESSION</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1"> starting at </span><span style="color:#cba6f7">$(</span>date<span style="color:#cba6f7">)</span><span style="color:#a6e3a1"> ---&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;    Log: </span><span style="color:#f5e0dc">$LOGFILE</span><span style="color:#a6e3a1">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  claude --dangerously-skip-permissions <span style="color:#89b4fa">\
</span></span></span><span style="display:flex;"><span><span style="color:#89b4fa"></span>    -p <span style="color:#a6e3a1">&#34;</span><span style="color:#cba6f7">$(</span>cat <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$PROMPT_FILE</span><span style="color:#a6e3a1">&#34;</span><span style="color:#cba6f7">)</span><span style="color:#a6e3a1">&#34;</span> <span style="color:#89b4fa">\
</span></span></span><span style="display:flex;"><span><span style="color:#89b4fa"></span>    --model <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$MODEL</span><span style="color:#a6e3a1">&#34;</span> <span style="color:#89b4fa">\
</span></span></span><span style="display:flex;"><span><span style="color:#89b4fa"></span>    &amp;&gt;<span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$LOGFILE</span><span style="color:#a6e3a1">&#34;</span> <span style="color:#89dceb;font-weight:bold">||</span> <span style="color:#89dceb">true</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;    Session #</span><span style="color:#a6e3a1">${</span><span style="color:#f5e0dc">SESSION</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1"> ended at </span><span style="color:#cba6f7">$(</span>date<span style="color:#cba6f7">)</span><span style="color:#a6e3a1">&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#6c7086;font-style:italic"># Brief pause between sessions to avoid hammering if something is broken</span>
</span></span><span style="display:flex;"><span>  sleep <span style="color:#fab387">5</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">done</span>
</span></span></code></pre></div><p>The <code>$PROMPT_FILE</code> is where the real work gets defined. It&rsquo;s a markdown file that tells the agent exactly what to do during each session. Mine walks the agent through a full lifecycle: orient itself on the project, pick up the next issue from Linear, build the feature, run a code review with subagents, and open a pull request. It also includes guardrails like one issue per session, never break main, and what to do if blocked or stuck for more than 15 minutes.</p>
<p>Let&rsquo;s walk through how each of these pieces works in practice.</p>
<h2 id="how-it-all-fits-together">How It All Fits Together</h2>
<p>I decided to give this a try on my recent project CreatorSignal that I&rsquo;ve been building during my <a href="https://www.youtube.com/@damian.galarza/streams">live streams</a>. While I&rsquo;ve seen many people maintaining their backlogs in markdown files or custom Kanban board experiences within Claude Code, I prefer using <a href="https://linear.app/">Linear</a>. I didn&rsquo;t want to recreate a task management system just for the agent loop. With the <a href="https://linear.app/docs/mcp">Linear MCP</a> in hand, here&rsquo;s how I set it up.</p>
<h3 id="progressmd">PROGRESS.md</h3>
<p>One of the core pieces is the <code>PROGRESS.md</code> file. While the individual tasks are tracked and maintained in Linear, this file is meant to serve as a sort of &ldquo;memory&rdquo; for the agents to understand what has been accomplished from a more holistic level. At the start of each loop, the <code>PROGRESS.md</code> file is read in. At the end of a loop, the agent writes to it what it has accomplished.</p>
<p>Example:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-markdown" data-lang="markdown"><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"># Progress
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span>
</span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold">## 2026-02-13
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span>
</span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold">### PRX-27: Billing portal (Stripe Customer Portal integration) — DONE
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span><span style="color:#cba6f7">-</span> Created <span style="color:#a6e3a1">`BillingPortalController`</span> with <span style="color:#a6e3a1">`show`</span> and <span style="color:#a6e3a1">`create`</span> actions
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> Billing page displays current plan, price, next billing date
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> &#34;Manage Subscription&#34; button creates Stripe BillingPortal::Session and redirects
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> Free users see upgrade CTA; former subscribers can still access portal for invoices
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> Cancellation pending state shown with reactivation option
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> 11 request specs + 6 system specs, all passing (266 total)
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> PR: https://github.com/dgalarza/CreatorSignal/pull/31
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> Branch based on PRX-25 (chain: PRX-17 → PRX-23 → PRX-24 → PRX-25 → PRX-27)
</span></span></code></pre></div><h3 id="implementing-an-issue">Implementing an Issue</h3>
<p>Using the Linear MCP, the agent finds the next highest priority issue to work on. It starts by looking at the &ldquo;Todo&rdquo; column and picks the next one up. If there&rsquo;s nothing in Todo, it checks the backlog instead. From there it reads the issue&rsquo;s details to understand the work that needs to be done. For the loop to work well, issues need to be spec&rsquo;d out thoroughly. This gives the agent the highest chance of performing quality work without human supervision.</p>
<p>With an issue selected, the agent moves it to &ldquo;In Progress&rdquo;, creates a branch, and starts building. A task is not considered &ldquo;done&rdquo; unless the test suite and linters both pass. This is another critical piece for a successful agent loop. The agent must have solid ways of verifying its own work. Without automated checks, it&rsquo;s difficult for the agent to understand success, and quality drops.</p>
<p>When the agent believes its work is ready, it comments on the Linear issue with a summary of what it built and moves the issue to &ldquo;In Review&rdquo;.</p>
<h3 id="code-review">Code Review</h3>
<p>Similar to my workflow described in <a href="/posts/2025-11-25-how-i-use-claude-code">How I Use Claude Code</a>, the next step is to spawn subagents to perform code review. The agent uses the <code>Task</code> tool to spin up a reviewer that evaluates the diff against the issue requirements, checking for correctness, test quality, Rails conventions, security, and performance.</p>
<p>The review is posted as a comment on the Linear issue. This provides visibility into the full lifecycle of the work. I can see the main agent&rsquo;s implementation summary alongside the code review feedback. The agent then resolves any feedback it received and posts a final comment on the Linear issue summarizing its decisions.</p>
<h3 id="pull-request">Pull Request</h3>
<p>After the code review process is complete and feedback is addressed, the agent commits the work and opens a pull request. The Linear issue is moved to &ldquo;Done&rdquo;, and the agent writes its progress update to the PROGRESS.md file.</p>
<h3 id="clean-up">Clean Up</h3>
<p>With everything complete, the agent&rsquo;s last instructions are to check out the main branch and rebase against origin/main so that the next loop starts in a fresh state. The loop then exits cleanly. There&rsquo;s a built-in pause after each iteration before the next one starts.</p>
<h3 id="visibility">Visibility</h3>
<p>This loop proved to work well. I connected Slack to my Linear project so I could see notifications coming in as the agent worked through issues. Each time an issue had its status updated, each time an agent completed its work, and each time an agent received and addressed review feedback, I could see the progress in real time.</p>
<h2 id="improving-on-the-workflow">Improving on the Workflow</h2>
<p>While this initial pass at a loop was working well, I had some things I wanted to improve. First, as pull requests were getting opened and merged, some would end up becoming stale with merge conflicts given the speed at which new features were landing. Second, I wanted to be able to leave feedback on a pull request as if I was working with a team member and have it get addressed by the agent as part of the loop.</p>
<p>I solved this by adding a new step to the loop as follows.</p>
<p>Before picking up a new task, the agent runs <code>bin/pr_check</code>. This script looks through my open pull requests for any with the &ldquo;needs-revision&rdquo; label. If none need review feedback addressed, it checks for any that have gone stale with merge conflicts.</p>
<p>If a PR like this is found, the loop addresses one PR leaving the next for the next loop iteration. So whenever I had a PR that I felt had feedback I wanted addressed, I would leave comments on it and add the &ldquo;needs-revision&rdquo; label. The next time a loop happens the agent will pick it up and address the feedback.</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># bin/pr_check</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Finds the first open PR that needs attention.</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Returns JSON with PR details if one needs work, or empty output if all clean.</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># A PR &#34;needs attention&#34; if:</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#   1. It has merge conflicts (mergeableStatus == CONFLICTING)</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#   2. It has the &#34;needs-revision&#34; label</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Usage:</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#   bin/pr_check           # returns JSON or empty</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#   bin/pr_check --quiet   # exit code only (0 = needs attention, 1 = all clean)</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Output format:</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#   {</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#     &#34;number&#34;: 42,</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#     &#34;branch&#34;: &#34;damian/prx-7-exa-research-tools&#34;,</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#     &#34;title&#34;: &#34;PRX-7: Exa research tools&#34;,</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#     &#34;url&#34;: &#34;https://github.com/...&#34;,</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#     &#34;reason&#34;: &#34;has_feedback&#34;,    # or &#34;conflicting&#34; or &#34;conflicting,has_feedback&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#     &#34;conflicting&#34;: true,</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#     &#34;has_feedback&#34;: true</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#   }</span>
</span></span></code></pre></div><p>The loop itself is about a 100 line bash script. I&rsquo;ll be adding it to my Claude Code workflows this week and sharing it with my newsletter.</p>
<h2 id="what-makes-this-work">What Makes This Work</h2>
<p>After running this loop across several sessions, a few things stand out as critical to getting quality results:</p>
<ol>
<li><strong>Fresh context per iteration.</strong> Running the loop in bash instead of inside a Claude Code session means each task gets a clean context window. This is the single biggest difference from the Ralph Wiggum approach.</li>
<li><strong>Well-spec&rsquo;d issues.</strong> The agent is only as good as the instructions it receives. Vague issues produce vague results. Detailed acceptance criteria and clear scope make all the difference.</li>
<li><strong>Automated verification.</strong> Requiring passing tests and linters before a task is considered &ldquo;done&rdquo; gives the agent a concrete definition of success. Without this, quality drops fast.</li>
<li><strong>Linear as the source of truth.</strong> Using an existing project management tool instead of reinventing one means I can see the full lifecycle of every issue, from backlog to done, with comments and status updates along the way.</li>
</ol>
<p>The combination of these pieces turns what could be a chaotic autonomous loop into something that produces reviewable, mergeable work. It&rsquo;s not perfect, and I still review every pull request before merging, but the amount of ground it covers between review cycles is significant.</p>
<h2 id="additional-reading">Additional Reading</h2>
<ul>
<li><a href="/posts/2026-02-05-mcps-vs-agent-skills/">MCPs vs Agent Skills: Understanding the Difference</a> - The agent loop relies on the Linear MCP as its backbone. This post covers how MCPs and skills serve different roles in your workflow.</li>
<li><a href="/posts/2025-12-08-understanding-claude-code-context-window/">Understanding Claude Code&rsquo;s Context Window</a> - A deep dive into how the context window works and why fresh context per iteration is so important.</li>
<li><a href="https://youtu.be/Seu7nksZ_4k">How AI Agents Remember Things</a> - The PROGRESS.md pattern is essentially agent memory between loop iterations. This video covers how agents persist context across sessions.</li>
<li><a href="https://youtu.be/tO_Larrawfg">MCPs vs Skills: The Mental Model You&rsquo;re Missing</a> - The video companion to the blog post above, covering the architectural distinction between MCPs and skills.</li>
<li><a href="https://www.youtube.com/playlist?list=PLeevcUmnIRCy8XirmTSbHz71hs31idVC3">Building CreatorSignal</a> - The livestream series where I&rsquo;m building CreatorSignal, the project this agent loop runs against.</li>
</ul>
<p>If you haven&rsquo;t already, sign up for my newsletter for weekly emails on AI Engineering and agentic development workflows.</p>
<hr>
<p>If you&rsquo;re building agent loops or autonomous workflows and want help getting the architecture right, I work with teams on exactly this. <a href="/ai-agents/">Let&rsquo;s talk</a>.</p>
]]></content:encoded></item><item><title>1. Building CreatorSignal: Rails Auth &amp; Deploy from Scratch (LIVE)</title><link>https://www.damiangalarza.com/videos/2026-02-05-1-building-creatorsignal-rails-auth-deploy-from-scratch-live/</link><pubDate>Thu, 05 Feb 2026 15:03:33 +0000</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/videos/2026-02-05-1-building-creatorsignal-rails-auth-deploy-from-scratch-live/</guid><description>I built a Rails app from scratch using only Claude Code as my AI coding partner. Here's the full, unedited session showing my real workflow for AI-augmented</description><content:encoded><![CDATA[<p>I built a Rails app from scratch using only Claude Code as my AI coding partner. Here&rsquo;s the full, unedited session showing my real workflow for AI-augmented development.</p>
<p>In this live coding session, I start a brand new Ruby on Rails SaaS app and build user authentication from zero using Claude Code. But this isn&rsquo;t a &ldquo;how to build auth in Rails&rdquo; tutorial. It&rsquo;s a deep look at how I actually use Claude Code to drive development: plan mode, custom skills, TDD workflows, iterating on Claude&rsquo;s memory, sub-agent code reviews, and more.</p>
]]></content:encoded></item><item><title>MCPs vs Agent Skills: Understanding the Difference</title><link>https://www.damiangalarza.com/posts/2026-02-05-mcps-vs-agent-skills/</link><pubDate>Thu, 05 Feb 2026 00:00:00 -0500</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/posts/2026-02-05-mcps-vs-agent-skills/</guid><description>MCPs give Claude capabilities. Skills teach Claude workflows. Here's the mental model I use to decide which one I need.</description><content:encoded><![CDATA[<p>&ldquo;Should I build a skill or an MCP for this?&rdquo;</p>
<p>I&rsquo;ve been asked this question a lot since Anthropic announced Agent Skills back in October 2025. And honestly, the confusion makes sense. Both extend Claude Code&rsquo;s capabilities. Both can connect to external services. Skills can even run scripts, which sounds a lot like what MCPs do.</p>
<p>But once you understand the mental model, the distinction becomes obvious. Let&rsquo;s break it down.</p>
<h2 id="what-mcps-actually-do">What MCPs Actually Do</h2>
<p>Model Context Protocol is an open standard for connecting AI applications to external systems. It&rsquo;s the plumbing that connects Claude to the outside world by exposing tools that can read data, execute actions, and interact with external services.</p>
<p>For example, you can add the Linear MCP and give Claude the ability to read and create issues, or add the Sentry MCP so it can query errors. These are capabilities Claude didn&rsquo;t have before. MCPs extend what Claude can do.</p>
<p>There&rsquo;s something you need to consider when adding MCPs though: every MCP you add to Claude Code takes up space in your context window just by being available. Not just when it&rsquo;s used, but constantly. If you&rsquo;ve read my post on <a href="/posts/2025-12-08-understanding-claude-code-context-window/">Understanding Claude Code&rsquo;s Context Window</a>, you know this matters a lot.</p>
<h3 id="the-anatomy-of-an-mcp-tool">The Anatomy of an MCP Tool</h3>
<p>Every MCP tool exposes information to the LLM so it knows when and how to use it. Here&rsquo;s what Claude sees when the Linear MCP is configured:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>│ get_issue <span style="color:#89dceb;font-weight:bold">(</span>linear-server<span style="color:#89dceb;font-weight:bold">)</span> <span style="color:#89dceb;font-weight:bold">[</span>read-only<span style="color:#89dceb;font-weight:bold">]</span>                                        │
</span></span><span style="display:flex;"><span>│ Tool name: get_issue                                                         │
</span></span><span style="display:flex;"><span>│ Full name: mcp__linear-server__get_issue                                     │
</span></span><span style="display:flex;"><span>│                                                                              │
</span></span><span style="display:flex;"><span>│ Description:                                                                 │
</span></span><span style="display:flex;"><span>│ Retrieve detailed information about an issue by ID, including attachments   │
</span></span><span style="display:flex;"><span>│ and git branch name                                                          │
</span></span><span style="display:flex;"><span>│                                                                              │
</span></span><span style="display:flex;"><span>│ Parameters:                                                                  │
</span></span><span style="display:flex;"><span>│   • id <span style="color:#89dceb;font-weight:bold">(</span>required<span style="color:#89dceb;font-weight:bold">)</span>: string - The issue ID                                     │
</span></span><span style="display:flex;"><span>│   • includeRelations: boolean - Whether to include blocking, related,        │
</span></span><span style="display:flex;"><span>│     and duplicate relations in the response                                  │
</span></span></code></pre></div><p>The description tells the LLM when and why to use the tool. Some descriptions are verbose, which means they consume more tokens on every single message. The parameter schema is typically JSON that defines the tool&rsquo;s inputs. And the tool name is what the LLM calls to invoke it.</p>
<p>Here&rsquo;s why this matters: in <a href="/posts/2025-11-06-build-efficient-mcp-servers-three-design-principles/">Build Efficient MCP Servers: Three Design Principles</a>, I showed how a Claude Code session can have 24% or more of the context window consumed by MCP tool definitions before you&rsquo;ve even started a conversation. Add a few feature-rich MCP servers and you&rsquo;ve got precious little space left for actual work.</p>
<p>This used to create a hard practical limit. Too many MCPs and the model would get confused, more likely to pick wrong actions. Anthropic addressed this in January 2026 with <a href="https://x.com/trq212/status/2011523109871108570">MCP Tool Search</a>, which dynamically loads MCP tools on-demand when they would consume more than 10% of context. This helps, but the underlying tension remains: MCP tool definitions compete for context space, which is why skills use a different approach entirely.</p>
<h3 id="the-key-characteristics">The Key Characteristics</h3>
<p>MCPs are:</p>
<ul>
<li><strong>Single-purpose tools</strong> - Each tool does one specific thing</li>
<li><strong>Autonomous</strong> - Claude can call them directly without any instruction from you</li>
<li><strong>Always loaded</strong> - Tool descriptions are in context on every message (or dynamically loaded via MCP Tool Search)</li>
<li><strong>Bidirectional</strong> - Can read from and write to external systems</li>
</ul>
<p>When you ask Claude &ldquo;What&rsquo;s the status of issue TRA-123?&rdquo;, it can autonomously decide to call the Linear MCP to fetch that information. No skill needed, no special invocation. The capability is just there.</p>
<h2 id="what-agent-skills-actually-do">What Agent Skills Actually Do</h2>
<p>Since the original announcement of Agent Skills, Anthropic has released Agent Skills as an open standard, and other tools like GitHub Copilot and Cursor now support them as well.</p>
<p>At first glance, skills look simple. They&rsquo;re essentially a folder with some markdown files and optionally some scripts:</p>
<pre tabindex="0"><code>my-skill/
├── SKILL.md           # Main instructions (required)
├── reference.md       # Detailed docs (loaded as needed)
├── examples.md        # Usage examples (loaded as needed)
└── scripts/
    └── helper.py      # Executable scripts (run, not loaded)
</code></pre><p>Skills typically live in <code>.claude/skills/</code> within your project or <code>~/.claude/skills/</code> for global availability.</p>
<p>Skills can execute code. But that&rsquo;s not what makes them special. What makes them special is orchestration. They compose multiple capabilities into a defined workflow.</p>
<p>A tool lets Claude query your database. A skill teaches Claude your company&rsquo;s specific data model, your naming conventions, your rollback procedures. MCPs are verbs. Skills are playbooks.</p>
<h3 id="the-four-flavors-of-skills">The Four Flavors of Skills</h3>
<p>In my experience, skills tend to fall into four categories:</p>
<p><strong>Specialized workflows</strong> are multi-step procedures for specific domains. Things like a TDD workflow, a PR review process, or a deployment checklist. These are the skills I use most often.</p>
<p><strong>Tool integrations</strong> are instructions for working with specific file formats or APIs. Maybe you need Claude to know how to process DOCX files, manipulate PDFs, or query BigQuery a specific way.</p>
<p><strong>Domain expertise</strong> captures company-specific knowledge. Your data model, your naming conventions, your rollback procedures. The stuff that lives in tribal knowledge.</p>
<p><strong>Knowledge retrieval</strong> bundles reference documentation that Claude can access on demand. API specs, style guides, architectural decision records. Rather than stuffing everything into CLAUDE.md, you package it into a skill that loads only when relevant.</p>
<h3 id="why-skills-exist-progressive-disclosure">Why Skills Exist: Progressive Disclosure</h3>
<p>The key design principle behind skills is progressive disclosure. Unlike MCPs where tool definitions are always present, skills only load their full content when invoked.</p>
<p>The most basic skill is a folder with a SKILL.md file. This file contains YAML frontmatter with metadata (name and description) followed by the actual instructions. For any given skill, only the metadata is persistently available. The description tells the LLM when to invoke the skill, so you need to capture the right semantics for the agent to pick it up appropriately.</p>
<p>Once the skill is invoked, the LLM loads the rest of the SKILL.md file into context and follows its instructions. You can also break skills into separate resource files for different scenarios or workflows. This lets you keep context lean by loading only what&rsquo;s needed for the current task.</p>
<h3 id="what-this-looks-like-in-practice">What This Looks Like in Practice</h3>
<p>In <a href="/posts/2025-11-25-how-i-use-claude-code/">How I Use Claude Code: My Complete Development Workflow</a>, I described my <code>linear-implement</code> skill that takes a Linear issue and implements a solution following TDD. Here&rsquo;s how the pieces fit together:</p>
<pre tabindex="0"><code>┌──────────────────────────────────────────────────────────────┐
│                        SKILL                                 │
│                 (orchestration layer)                        │
│                                                              │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ Bundled: scripts/ │ references/ │ assets/               │ │
│  └─────────────────────────────────────────────────────────┘ │
│                                                              │
│    ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐       │
│    │  MCP    │  │  Bash   │  │  File   │  │  Web    │       │
│    │ (Linear)│  │ (tests) │  │ (write) │  │ (fetch) │       │
│    └─────────┘  └─────────┘  └─────────┘  └─────────┘       │
└──────────────────────────────────────────────────────────────┘
</code></pre><p>The skill sits on top and orchestrates everything. It calls the Linear MCP to fetch issue details. It runs bash commands to execute tests. It writes code files following TDD. It creates PRs via the GitHub CLI.</p>
<p>Without a skill, Claude can do all these things individually. But you have to orchestrate each step manually. Every session, you re-explain the workflow. With a skill, one command triggers the entire workflow. Consistent process every time. Your expertise encoded into Claude&rsquo;s behavior.</p>
<h2 id="the-claudemd-vs-skills-question">The CLAUDE.md vs Skills Question</h2>
<p>A common point of confusion: when should something go in CLAUDE.md versus a skill?</p>
<p>Here&rsquo;s how I think about it:</p>
<p><strong>CLAUDE.md</strong> is for declarative knowledge. What and why. Background context that Claude should just know. &ldquo;This is Rails 7 with RSpec.&rdquo; &ldquo;We use JSON:API format.&rdquo; &ldquo;Run tests with <code>bin/rspec</code>.&rdquo;</p>
<p><strong>Skills</strong> are for procedural knowledge. How. Multi-step workflows with defined steps. &ldquo;When implementing a feature, follow this TDD workflow&hellip;&rdquo; &ldquo;To deploy, run these 5 steps&hellip;&rdquo;</p>
<p>The analogy that works for me: CLAUDE.md is like an employee handbook (background context). Skills are like training modules (specific procedures).</p>
<p>If you&rsquo;re copy-pasting the same multi-step instructions into chat repeatedly, that&rsquo;s a skill waiting to be created. If it&rsquo;s background context Claude should just know, it belongs in CLAUDE.md.</p>
<p>There&rsquo;s a practical difference too. CLAUDE.md is always loaded in context, so it should stay lean. Skills use progressive disclosure, so they can be extensive without penalty when not in use.</p>
<h2 id="putting-it-together">Putting It Together</h2>
<p>Now that we&rsquo;ve covered what each one does separately, let me show you what it looks like when they work together.</p>
<pre tabindex="0"><code>┌─────────────────────────────────────────────────────────────────┐
│  Prompt: &#34;Help me implement Linear TRA-123&#34;                     │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  SKILL activates (context match)                                │
│  → Loads bundled resources, defines workflow                    │
└─────────────────────────────────────────────────────────────────┘
                              │
                    ┌─────────┴─────────┐
                    ▼                   ▼
              ┌──────────┐       ┌──────────┐
              │   MCP    │       │  Native  │
              │ (Linear) │       │  Tools   │
              │          │       │          │
              │ Fetches  │       │ Bash,    │
              │ issue    │       │ File ops │
              │ details  │       │ for TDD  │
              └──────────┘       └──────────┘
                    │                   │
                    └─────────┬─────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  Result: Feature implemented following TDD workflow             │
└─────────────────────────────────────────────────────────────────┘
</code></pre><p>The MCP gives access to Linear (the capability). The skill orchestrates the workflow (the recipe). Each has its role.</p>
<h2 id="the-two-questions">The Two Questions</h2>
<p>You might be thinking there&rsquo;s overlap here. Since skills can have scripts, can&rsquo;t they also connect to external services? Yes, they can. Skills can include scripts that hit APIs, run curl commands, whatever you need. The difference is that these scripts only run in the context of the skill itself. If you need something more general purpose that Claude can call from any context, you want an MCP instead.</p>
<p>When Claude needs to check Linear issues, it can do that anytime, in any context, without any special setup. That&rsquo;s an MCP&rsquo;s job. But when you say &ldquo;implement this feature,&rdquo; you want a specific sequence of steps followed in a specific order. That&rsquo;s a skill&rsquo;s job.</p>
<p>When I need to decide which to use, I ask myself two questions:</p>
<h3 id="question-1-should-claude-be-able-to-call-this-capability-anytime-across-any-context">Question 1: Should Claude be able to call this capability anytime, across any context?</h3>
<p>If yes, you need an MCP.</p>
<p>If only during a specific workflow, a skill with scripts is fine.</p>
<p>Checking Linear issues? That&rsquo;s something Claude might need to do in many different contexts. MCP makes sense. Deploying to staging? That&rsquo;s a specific workflow with defined steps. Skill makes sense.</p>
<h3 id="question-2-is-this-a-repeatable-workflow-with-defined-steps">Question 2: Is this a repeatable workflow with defined steps?</h3>
<p>If yes, build a skill (with or without MCPs).</p>
<p>If no, you might not need either. Just ask Claude directly.</p>
<p>If you find yourself explaining the same multi-step process to Claude repeatedly, that&rsquo;s your signal. That&rsquo;s when you build a skill.</p>
<h3 id="common-patterns">Common Patterns</h3>
<p>Here&rsquo;s how this plays out in practice:</p>
<p><strong>MCP alone</strong>: &ldquo;Check my Linear issues.&rdquo; Claude decides to call it autonomously.</p>
<p><strong>Skill using MCP</strong>: &ldquo;Implement TRA-123.&rdquo; The skill orchestrates the workflow, calling the Linear MCP as one step among many.</p>
<p><strong>Skill with scripts</strong>: &ldquo;Deploy to staging.&rdquo; The workflow runs deploy scripts that hit external services.</p>
<p><strong>Skill without external calls</strong>: &ldquo;Follow our TDD process.&rdquo; Pure internal workflow, no external systems needed.</p>
<h2 id="the-mental-model">The Mental Model</h2>
<p>That&rsquo;s the mental model. MCPs give Claude capabilities. Skills give Claude orchestration. Or to put it another way:</p>
<p><strong>MCPs</strong> = The tools in the toolbox
<strong>Skills</strong> = The recipes that coordinate those tools</p>
<p>MCPs are the plumbing connecting Claude to the outside world. Skills are the playbook of procedural knowledge.</p>
<p>MCPs answer &ldquo;what can Claude access?&rdquo; Skills answer &ldquo;how should Claude approach this task?&rdquo;</p>
<h2 id="getting-started">Getting Started</h2>
<p>If you&rsquo;re just getting started, here&rsquo;s my recommendation:</p>
<p><strong>Start with MCPs.</strong> Find one that connects to a tool you already use. Linear, Sentry, your database, whatever. Install it and start calling it. Get a feel for how Claude uses capabilities autonomously.</p>
<p><strong>Watch for patterns.</strong> When you notice you&rsquo;re asking Claude the same multi-step sequence over and over, that&rsquo;s your signal. That&rsquo;s when you build a skill.</p>
<p><strong>Keep it simple.</strong> Your first skill doesn&rsquo;t need to be complex. Start with a workflow you repeat weekly, document the steps, and let Claude follow them consistently.</p>
<p>If you want to build your own skill from scratch, check out my video <a href="https://youtu.be/7fNOpyke2kw?si=ZotNFdJ8NyKvc4J-">Claude Code Tutorial: Build your first skill in 10 minutes</a> where I walk through creating a TDD workflow skill step by step.</p>
<p>For more examples, sign up for my newsletter. You&rsquo;ll get access to my claude-code-workflows repo on GitHub, which includes several skills I use daily, including the linear-implement workflow that ties everything together.</p>
<h2 id="further-reading">Further Reading</h2>
<ul>
<li><a href="https://youtu.be/tO_Larrawfg?si=o1qMFo5DiC8xz8oA">MCPs vs Skills</a></li>
<li><a href="https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills">Agent Skills - Anthropic Engineering Blog</a></li>
<li><a href="https://agentskills.io/specification">Agent Skills Specification</a></li>
<li><a href="/posts/2025-12-08-understanding-claude-code-context-window/">Understanding Claude Code&rsquo;s Context Window</a></li>
<li><a href="/posts/2025-11-25-how-i-use-claude-code/">How I Use Claude Code: My Complete Development Workflow</a></li>
<li><a href="/posts/2025-11-06-build-efficient-mcp-servers-three-design-principles/">Build Efficient MCP Servers: Three Design Principles</a></li>
</ul>
<hr>
<p>Working through how to structure your Claude Code setup with the right mix of MCPs and skills? I help engineers and teams design workflows that stick. <a href="/claude-code/">Learn more</a>.</p>
]]></content:encoded></item><item><title>Claude Code Git Worktree Setup: Run Multiple Agents in Parallel</title><link>https://www.damiangalarza.com/videos/2026-01-27-claude-code-git-worktree-setup-run-multiple-agents-in-parallel/</link><pubDate>Tue, 27 Jan 2026 17:01:21 +0000</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/videos/2026-01-27-claude-code-git-worktree-setup-run-multiple-agents-in-parallel/</guid><description>Git worktrees + Claude Code = parallel AI agent workflows without conflicts.</description><content:encoded><![CDATA[<p>Git worktrees + Claude Code = parallel AI agent workflows without conflicts.</p>
<p>In this tutorial, I&rsquo;ll show you my exact Claude Code git worktree setup for running multiple agents on the same repo. No file collisions, no database locks, no branch chaos.</p>
]]></content:encoded></item><item><title>Claude Code Tutorial: Build Your First Skill in 10 Minutes</title><link>https://www.damiangalarza.com/videos/2026-01-20-claude-code-tutorial-build-your-first-skill-in-10-minutes/</link><pubDate>Tue, 20 Jan 2026 13:15:01 +0000</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/videos/2026-01-20-claude-code-tutorial-build-your-first-skill-in-10-minutes/</guid><description>Claude Code tutorial: Learn how to build your first Claude Code skill—a reusable workflow that automates test-driven development. One markdown file, and Claude</description><content:encoded><![CDATA[<p>Claude Code tutorial: Learn how to build your first Claude Code skill—a reusable workflow that automates test-driven development. One markdown file, and Claude follows your TDD process every time.</p>
<p>In this step-by-step tutorial, I show you how to create a Claude Code skill from scratch. You&rsquo;ll learn how Claude Code skills work, where they live, and how to write instructions that make Claude follow proper red-green-refactor TDD.</p>
]]></content:encoded></item><item><title>MCPs vs Skills: The Framework That Ends the Confusion</title><link>https://www.damiangalarza.com/videos/2026-01-14-mcps-vs-skills-the-framework-that-ends-the-confusion/</link><pubDate>Wed, 14 Jan 2026 16:52:37 +0000</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/videos/2026-01-14-mcps-vs-skills-the-framework-that-ends-the-confusion/</guid><description>MCPs and Skills both extend Claude Code—but when do you use each?</description><content:encoded><![CDATA[<p>MCPs and Skills both extend Claude Code—but when do you use each?</p>
<p>🔑 The simple framework:</p>
<ul>
<li>MCPs = Capabilities (tools that give Claude new powers)</li>
<li>Skills = Orchestration (workflows that teach Claude processes)</li>
</ul>
<p>⏱️ Timestamps:</p>
<p>0:00 The Confusion Everyone Has
0:21 Model Context Protocol
2:51 Agent Skills
6:42 CLAUDE.md Confusion
8:03 The Framework</p>
<p>📚 Resources mentioned:</p>
<ul>
<li>Understanding Claude Code&rsquo;s Context Window: <a href="https://www.damiangalarza.com/posts/2025-12-08-understanding-claude-code-context-window/">https://www.damiangalarza.com/posts/2025-12-08-understanding-claude-code-context-window/</a></li>
<li>Claude Code Skills docs: <a href="https://agentskills.io/">https://agentskills.io/</a></li>
<li>MCP documentation: <a href="https://modelcontextprotocol.io/">https://modelcontextprotocol.io/</a></li>
</ul>
<p><em>Get my Claude Code workflows + weekly insights:</em>
→ <a href="https://www.damiangalarza.com/newsletter?utm_source=youtube&amp;utm_medium=video&amp;utm_campaign=parallel-agents">https://www.damiangalarza.com/newsletter?utm_source=youtube&amp;utm_medium=video&amp;utm_campaign=parallel-agents</a></p>
<p><em>→ Want to level up your Claude Code setup?</em>
<a href="https://www.damiangalarza.com/claude-code">www.damiangalarza.com/claude-code</a></p>
<p><em>Affiliate links — I may earn a commission or credits at no cost to you:</em>
🚂 Railway (where I deploy): <a href="https://railway.com?referralCode=dgalarza">https://railway.com?referralCode=dgalarza</a></p>
<p><em>About me:</em>
I&rsquo;m Damian Galarza, former CTO with 15+ years building SaaS products. I make practical AI tutorials for developers who want to work smarter with tools like Claude Code.</p>
<p><em>Work with me:</em></p>
<ul>
<li>1:1 Coaching: <a href="https://www.damiangalarza.com/coaching">https://www.damiangalarza.com/coaching</a></li>
<li>Newsletter: <a href="https://www.damiangalarza.com/newsletter">https://www.damiangalarza.com/newsletter</a></li>
<li>Blog: <a href="https://www.damiangalarza.com">https://www.damiangalarza.com</a></li>
</ul>
<p>#ClaudeCode #claudecodetutorial  #AI #DeveloperTools #MCP #Programming #AgentSkills</p>
]]></content:encoded></item><item><title>Debug Production Errors in 5 Minutes with Claude AI</title><link>https://www.damiangalarza.com/videos/2026-01-06-debug-production-errors-in-5-minutes-with-claude-ai/</link><pubDate>Tue, 06 Jan 2026 14:00:28 +0000</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/videos/2026-01-06-debug-production-errors-in-5-minutes-with-claude-ai/</guid><description>Today I'm showing you exactly how I debug production errors using Claude Code and the Sentry MCP—in a real Rails application.</description><content:encoded><![CDATA[<p>Today I&rsquo;m showing you exactly how I debug production errors using Claude Code and the Sentry MCP—in a real Rails application.</p>
<p>🎯 Want help setting up MCPs and debugging workflows for your team? Book a 1:1 session:
→ <a href="https://www.damiangalarza.com/coaching/?utm_source=youtube&amp;utm_medium=video&amp;utm_campaign=sentry-mcp">https://www.damiangalarza.com/coaching/?utm_source=youtube&amp;utm_medium=video&amp;utm_campaign=sentry-mcp</a></p>
]]></content:encoded></item><item><title>Getting Started with Claude Code</title><link>https://www.damiangalarza.com/videos/2025-12-29-getting-started-with-claude-code/</link><pubDate>Mon, 29 Dec 2025 18:18:03 +0000</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/videos/2025-12-29-getting-started-with-claude-code/</guid><description>Learn how to set up Claude Code and start using it on your existing projects in under 15 minutes. No more vibe coding - get real AI-assisted development that</description><content:encoded><![CDATA[<p>Learn how to set up Claude Code and start using it on your existing projects in under 15 minutes. No more vibe coding - get real AI-assisted development that actually understands your codebase.</p>
<p>Claude Code is Anthropic&rsquo;s official CLI tool that lives in your terminal, reads your code, makes edits, and runs commands. In this video, I&rsquo;ll take you from zero to productive with a real project.</p>
]]></content:encoded></item><item><title>Understanding Claude Code's Context Window</title><link>https://www.damiangalarza.com/posts/2025-12-08-understanding-claude-code-context-window/</link><pubDate>Mon, 08 Dec 2025 00:00:00 -0500</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/posts/2025-12-08-understanding-claude-code-context-window/</guid><description>How Claude Code's context window works: what consumes tokens (MCP servers, tools, messages), why it matters, and how to manage context effectively.</description><content:encoded><![CDATA[<p>I&rsquo;ve been using Claude Code for some time now, and as I discussed in <a href="/posts/2025-11-25-how-i-use-claude-code/">How I Use Claude Code: My Complete Development Workflow</a>, using AI coding tools effectively is a skill in itself. One of the most important parts of getting value from your AI coding assistant is managing context.</p>
<p>In this post we&rsquo;ll look at how you can make the most of your available context window in Claude Code, as well as some common pitfalls to avoid.</p>
<h2 id="understanding-the-context-window">Understanding the Context Window</h2>
<p>Before we can begin to try to optimize our developer workflow we need to get an understanding of what the context window is and how it gets filled. The context window is how much content a large language model can hold onto at one time. Each model has predefined limits to the size of its context window. For example, Claude Sonnet 4.5&rsquo;s context window is about 200,000 tokens.</p>
<h3 id="what-is-a-token">What is a Token?</h3>
<p>When you send text to an LLM, it doesn&rsquo;t process words one at a time. Instead, text is broken into <strong>tokens</strong>—the fundamental units that language models read and generate. A token typically represents 3-4 characters, or roughly 0.75 words in English.</p>
<p>For example, the phrase <code>&quot;Hello world&quot;</code> becomes 2-3 tokens, while a compound word like <code>authentication_middleware</code> might be split into 5-7 tokens despite being a single identifier. Code tends to be more token-dense than prose because of special characters, naming conventions, and syntax. This is why reading source files consumes context faster than you might expect.</p>
<h3 id="why-token-efficiency-matters">Why Token Efficiency Matters</h3>
<p>Context windows have limited space, and filling them with code happens fast. But running out of room isn&rsquo;t the only concern. LLMs suffer from a &ldquo;lost in the middle&rdquo; problem. Content at the start and end of the context window gets prioritized, while information in the middle tends to get overlooked. This mirrors how human memory works (we remember beginnings and endings better than middles).</p>
<p><img src="/images/posts/lost-in-the-middle.png" alt="Diagram showing how LLMs prioritize content at the beginning and end of context windows while missing information in the middle"></p>
<p>Additionally, our code isn&rsquo;t the only thing consuming context window space. Our context window is going to be filled by:</p>
<p><strong>MCP Servers</strong></p>
<p>Every MCP server you add is going to take some amount of space in your context window just by being available and present. Every MCP tool definition comes with:</p>
<ol>
<li><strong>Tool name</strong> (e.g., mcp__ynab__get_transactions)</li>
<li><strong>Description</strong> - an explanation as to what the tool does so that the LLM can understand when it might be needed.</li>
<li><strong>Parameter Schema</strong> - JSON schema definition of all the parameters, types, descriptions and constraints.</li>
<li><strong>Usage notes</strong> - additional instructions and potentially examples to guide the LLM during its tool choice.</li>
</ol>
<p>Let&rsquo;s take a look at an example from the YNAB MCP I built and discussed in <a href="/posts/2025-11-06-build-efficient-mcp-servers-three-design-principles/">Build Efficient MCP Servers: Three Design Principles</a>.</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">&#34;name&#34;</span>: <span style="color:#a6e3a1">&#34;mcp__ynab__get_transactions&#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">&#34;description&#34;</span>: <span style="color:#a6e3a1">&#34;Get transactions from YNAB budget.\n\n    Retrieves transactions with optional filtering by date
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">   range, account, or category.\n    Returns transaction details including date, amount, payee, category, and
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">  memo.\n\n    Use this tool when you need to:\n    - View recent transactions\n    - Find transactions in a
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">  specific date range\n    - Filter transactions by account or category\n    - Check transaction details for
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">  reconciliation\n\n    Args:\n        budget_id: Budget ID or &#39;last-used&#39; for default budget\n        since_date:
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">  Optional start date (YYYY-MM-DD format)\n        until_date: Optional end date (YYYY-MM-DD format)\n
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">  account_id: Optional account ID to filter by specific account\n        category_id: Optional category ID to filter
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">   by category\n        type: Optional transaction type (&#39;uncategorized&#39;, &#39;unapproved&#39;)\n\n    Returns:\n
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">  JSON array of transactions with:\n        - id: Transaction ID\n        - date: Transaction date\n        -
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">  amount: Amount in milliunits (divide by 1000 for dollars)\n        - memo: Transaction memo\n        - cleared:
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">  Cleared status\n        - approved: Approval status\n        - payee_id: Payee ID\n        - payee_name: Payee
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">  name\n        - category_id: Category ID\n        - category_name: Category name\n        - account_id: Account
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">  ID\n        - account_name: Account name\n\n    Example usage:\n        Get all transactions from November 2024:\n
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">          since_date=&#39;2024-11-01&#39;, until_date=&#39;2024-11-30&#39;\n\n        Get recent uncategorized transactions:\n
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    type=&#39;uncategorized&#39;\n\n    Note: Amounts are returned in milliunits. Divide by 1000 to get dollar amounts.\n
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">   &#34;</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">&#34;parameters&#34;</span>: {
</span></span><span style="display:flex;"><span>      <span style="color:#cba6f7">&#34;type&#34;</span>: <span style="color:#a6e3a1">&#34;object&#34;</span>,
</span></span><span style="display:flex;"><span>      <span style="color:#cba6f7">&#34;properties&#34;</span>: {
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;budget_id&#34;</span>: {
</span></span><span style="display:flex;"><span>          <span style="color:#cba6f7">&#34;type&#34;</span>: <span style="color:#a6e3a1">&#34;string&#34;</span>,
</span></span><span style="display:flex;"><span>          <span style="color:#cba6f7">&#34;description&#34;</span>: <span style="color:#a6e3a1">&#34;Budget ID or &#39;last-used&#39; for default budget&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;since_date&#34;</span>: {
</span></span><span style="display:flex;"><span>          <span style="color:#cba6f7">&#34;type&#34;</span>: <span style="color:#a6e3a1">&#34;string&#34;</span>,
</span></span><span style="display:flex;"><span>          <span style="color:#cba6f7">&#34;description&#34;</span>: <span style="color:#a6e3a1">&#34;Optional start date (YYYY-MM-DD format)&#34;</span>,
</span></span><span style="display:flex;"><span>          <span style="color:#cba6f7">&#34;format&#34;</span>: <span style="color:#a6e3a1">&#34;date&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;until_date&#34;</span>: {
</span></span><span style="display:flex;"><span>          <span style="color:#cba6f7">&#34;type&#34;</span>: <span style="color:#a6e3a1">&#34;string&#34;</span>,
</span></span><span style="display:flex;"><span>          <span style="color:#cba6f7">&#34;description&#34;</span>: <span style="color:#a6e3a1">&#34;Optional end date (YYYY-MM-DD format)&#34;</span>,
</span></span><span style="display:flex;"><span>          <span style="color:#cba6f7">&#34;format&#34;</span>: <span style="color:#a6e3a1">&#34;date&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;account_id&#34;</span>: {
</span></span><span style="display:flex;"><span>          <span style="color:#cba6f7">&#34;type&#34;</span>: <span style="color:#a6e3a1">&#34;string&#34;</span>,
</span></span><span style="display:flex;"><span>          <span style="color:#cba6f7">&#34;description&#34;</span>: <span style="color:#a6e3a1">&#34;Optional account ID to filter by&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;category_id&#34;</span>: {
</span></span><span style="display:flex;"><span>          <span style="color:#cba6f7">&#34;type&#34;</span>: <span style="color:#a6e3a1">&#34;string&#34;</span>,
</span></span><span style="display:flex;"><span>          <span style="color:#cba6f7">&#34;description&#34;</span>: <span style="color:#a6e3a1">&#34;Optional category ID to filter by&#34;</span>
</span></span><span style="display:flex;"><span>        },
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;type&#34;</span>: {
</span></span><span style="display:flex;"><span>          <span style="color:#cba6f7">&#34;type&#34;</span>: <span style="color:#a6e3a1">&#34;string&#34;</span>,
</span></span><span style="display:flex;"><span>          <span style="color:#cba6f7">&#34;enum&#34;</span>: [<span style="color:#a6e3a1">&#34;uncategorized&#34;</span>, <span style="color:#a6e3a1">&#34;unapproved&#34;</span>],
</span></span><span style="display:flex;"><span>          <span style="color:#cba6f7">&#34;description&#34;</span>: <span style="color:#a6e3a1">&#34;Optional transaction type filter&#34;</span>
</span></span><span style="display:flex;"><span>        }
</span></span><span style="display:flex;"><span>      },
</span></span><span style="display:flex;"><span>      <span style="color:#cba6f7">&#34;required&#34;</span>: [<span style="color:#a6e3a1">&#34;budget_id&#34;</span>],
</span></span><span style="display:flex;"><span>      <span style="color:#cba6f7">&#34;title&#34;</span>: <span style="color:#a6e3a1">&#34;GetTransactionsArguments&#34;</span>
</span></span><span style="display:flex;"><span>    }
</span></span><span style="display:flex;"><span>  }
</span></span></code></pre></div><p><strong>Token Breakdown</strong></p>
<table>
  <thead>
      <tr>
          <th>Component</th>
          <th>Tokens</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Tool name</td>
          <td>8</td>
      </tr>
      <tr>
          <td>Description (entire string)</td>
          <td>430</td>
      </tr>
      <tr>
          <td>Parameters schema</td>
          <td>225</td>
      </tr>
      <tr>
          <td>TOTAL</td>
          <td>~663 tokens</td>
      </tr>
  </tbody>
</table>
<p>This one tool definition takes up about 663 tokens. Not terrible on its own, but my YNAB MCP has about 15 tools. As you add more and more MCP servers to your stack you are consuming more and more of your context window from tool definitions alone. So it&rsquo;s important to be careful not to overload your coding assistant with too many MCP servers.</p>
<p>The community has been exploring new ways to make MCP servers more context efficient. One approach Anthropic has written about is allowing code execution within MCP servers. You can learn more about this in <a href="https://www.anthropic.com/engineering/code-execution-with-mcp">Code execution with MCP: Building more efficient agents</a> but the short version is instead of having an MCP server expose lots of different tools, it exposes a single tool or handful of tools which can then execute their own code in a sandboxed environment to achieve results on its own. Anthropic also recently announced a beta feature for <a href="https://www.anthropic.com/engineering/advanced-tool-use">advanced tool use</a> in Claude. One of the stand out updates here is moving away from a static tool list to being able lazily load tool definitions via a tool search tool.</p>
<p>Both of these are in their early stages so we&rsquo;ll continue to need to be careful about how many MCP servers we add to our coding agents and how much of the context window they consume. With that out of the way let&rsquo;s take a look at a real world scenario of a context window in a development environment and how we can make the best out of it.</p>
<h2 id="a-view-into-your-context-window">A View Into Your Context Window</h2>
<p>Claude Code provides us with a command we can run within a session called <code>/context</code>. This command will report back the current state of your context window including how much space everything is taking up. Let&rsquo;s take a look at the output of <code>/context</code> within <a href="http://www.tracewell.ai">Tracewell AI</a>:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>Context Usage
</span></span><span style="display:flex;"><span>⛁ ⛀ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁   claude-sonnet-4-5-20250929 · 101k/200k tokens <span style="color:#89dceb;font-weight:bold">(</span>51%<span style="color:#89dceb;font-weight:bold">)</span>
</span></span><span style="display:flex;"><span>⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁
</span></span><span style="display:flex;"><span>⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛀ ⛶   ⛁ System prompt: 3.1k tokens <span style="color:#89dceb;font-weight:bold">(</span>1.6%<span style="color:#89dceb;font-weight:bold">)</span>
</span></span><span style="display:flex;"><span>⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ System tools: 19.8k tokens <span style="color:#89dceb;font-weight:bold">(</span>9.9%<span style="color:#89dceb;font-weight:bold">)</span>
</span></span><span style="display:flex;"><span>⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ MCP tools: 26.5k tokens <span style="color:#89dceb;font-weight:bold">(</span>13.3%<span style="color:#89dceb;font-weight:bold">)</span>
</span></span><span style="display:flex;"><span>⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ Custom agents: 2.8k tokens <span style="color:#89dceb;font-weight:bold">(</span>1.4%<span style="color:#89dceb;font-weight:bold">)</span>
</span></span><span style="display:flex;"><span>⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ Memory files: 4.0k tokens <span style="color:#89dceb;font-weight:bold">(</span>2.0%<span style="color:#89dceb;font-weight:bold">)</span>
</span></span><span style="display:flex;"><span>⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛝ ⛝ ⛝   ⛁ Messages: <span style="color:#fab387">8</span> tokens <span style="color:#89dceb;font-weight:bold">(</span>0.0%<span style="color:#89dceb;font-weight:bold">)</span>
</span></span><span style="display:flex;"><span>⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝   ⛶ Free space: 99k <span style="color:#89dceb;font-weight:bold">(</span>49.4%<span style="color:#89dceb;font-weight:bold">)</span>
</span></span><span style="display:flex;"><span>⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝   ⛝ Autocompact buffer: 45.0k tokens <span style="color:#89dceb;font-weight:bold">(</span>22.5%<span style="color:#89dceb;font-weight:bold">)</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>MCP tools · /mcp
</span></span><span style="display:flex;"><span>└ mcp__memory__create_entities <span style="color:#89dceb;font-weight:bold">(</span>memory<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">686</span> tokens
</span></span><span style="display:flex;"><span>└ mcp__memory__create_relations <span style="color:#89dceb;font-weight:bold">(</span>memory<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">689</span> tokens
</span></span><span style="display:flex;"><span>└ mcp__memory__add_observations <span style="color:#89dceb;font-weight:bold">(</span>memory<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">668</span> tokens
</span></span><span style="display:flex;"><span>└ mcp__memory__delete_entities <span style="color:#89dceb;font-weight:bold">(</span>memory<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">612</span> tokens
</span></span><span style="display:flex;"><span>└ mcp__memory__delete_observations <span style="color:#89dceb;font-weight:bold">(</span>memory<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">666</span> tokens
</span></span><span style="display:flex;"><span>└ mcp__memory__delete_relations <span style="color:#89dceb;font-weight:bold">(</span>memory<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">690</span> tokens
</span></span><span style="display:flex;"><span>└ mcp__memory__read_graph <span style="color:#89dceb;font-weight:bold">(</span>memory<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">568</span> tokens
</span></span><span style="display:flex;"><span>└ mcp__memory__search_nodes <span style="color:#89dceb;font-weight:bold">(</span>memory<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">607</span> tokens
</span></span><span style="display:flex;"><span>└ mcp__memory__open_nodes <span style="color:#89dceb;font-weight:bold">(</span>memory<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">609</span> tokens
</span></span><span style="display:flex;"><span>└ mcp__sentry__whoami <span style="color:#89dceb;font-weight:bold">(</span>sentry<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">602</span> tokens
</span></span><span style="display:flex;"><span>└ mcp__sentry__find_organizations <span style="color:#89dceb;font-weight:bold">(</span>sentry<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">735</span> tokens
</span></span><span style="display:flex;"><span>└ mcp__sentry__find_teams <span style="color:#89dceb;font-weight:bold">(</span>sentry<span style="color:#89dceb;font-weight:bold">)</span>: 1.0k tokens
</span></span><span style="display:flex;"><span>└ mcp__sentry__find_projects <span style="color:#89dceb;font-weight:bold">(</span>sentry<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">999</span> tokens
</span></span><span style="display:flex;"><span>└ mcp__sentry__find_releases <span style="color:#89dceb;font-weight:bold">(</span>sentry<span style="color:#89dceb;font-weight:bold">)</span>: 1.2k tokens
</span></span><span style="display:flex;"><span>└ mcp__sentry__get_issue_details <span style="color:#89dceb;font-weight:bold">(</span>sentry<span style="color:#89dceb;font-weight:bold">)</span>: 1.4k tokens
</span></span><span style="display:flex;"><span>└ mcp__sentry__get_trace_details <span style="color:#89dceb;font-weight:bold">(</span>sentry<span style="color:#89dceb;font-weight:bold">)</span>: 1.3k tokens
</span></span><span style="display:flex;"><span>└ mcp__sentry__get_event_attachment <span style="color:#89dceb;font-weight:bold">(</span>sentry<span style="color:#89dceb;font-weight:bold">)</span>: 1.3k tokens
</span></span><span style="display:flex;"><span>└ mcp__sentry__update_issue <span style="color:#89dceb;font-weight:bold">(</span>sentry<span style="color:#89dceb;font-weight:bold">)</span>: 1.5k tokens
</span></span><span style="display:flex;"><span>└ mcp__sentry__search_events <span style="color:#89dceb;font-weight:bold">(</span>sentry<span style="color:#89dceb;font-weight:bold">)</span>: 1.5k tokens
</span></span><span style="display:flex;"><span>└ mcp__sentry__find_dsns <span style="color:#89dceb;font-weight:bold">(</span>sentry<span style="color:#89dceb;font-weight:bold">)</span>: 1.0k tokens
</span></span><span style="display:flex;"><span>└ mcp__sentry__analyze_issue_with_seer <span style="color:#89dceb;font-weight:bold">(</span>sentry<span style="color:#89dceb;font-weight:bold">)</span>: 1.3k tokens
</span></span><span style="display:flex;"><span>└ mcp__sentry__search_docs <span style="color:#89dceb;font-weight:bold">(</span>sentry<span style="color:#89dceb;font-weight:bold">)</span>: 1.8k tokens
</span></span><span style="display:flex;"><span>└ mcp__sentry__get_doc <span style="color:#89dceb;font-weight:bold">(</span>sentry<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">768</span> tokens
</span></span><span style="display:flex;"><span>└ mcp__sentry__search_issues <span style="color:#89dceb;font-weight:bold">(</span>sentry<span style="color:#89dceb;font-weight:bold">)</span>: 1.5k tokens
</span></span><span style="display:flex;"><span>└ mcp__sentry__use_sentry <span style="color:#89dceb;font-weight:bold">(</span>sentry<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">968</span> tokens
</span></span><span style="display:flex;"><span>└ mcp__context7__resolve-library-id <span style="color:#89dceb;font-weight:bold">(</span>context7<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">887</span> tokens
</span></span><span style="display:flex;"><span>└ mcp__context7__get-library-docs <span style="color:#89dceb;font-weight:bold">(</span>context7<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">957</span> tokens
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>Custom agents · /agents
</span></span><span style="display:flex;"><span>└ rails-backend-expert <span style="color:#89dceb;font-weight:bold">(</span>Project<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">444</span> tokens
</span></span><span style="display:flex;"><span>└ cybersecurity-expert <span style="color:#89dceb;font-weight:bold">(</span>Project<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">287</span> tokens
</span></span><span style="display:flex;"><span>└ prompt-engineer <span style="color:#89dceb;font-weight:bold">(</span>Project<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">609</span> tokens
</span></span><span style="display:flex;"><span>└ tailwind-viewcomponent-expert <span style="color:#89dceb;font-weight:bold">(</span>Project<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">417</span> tokens
</span></span><span style="display:flex;"><span>└ product-strategy-advisor <span style="color:#89dceb;font-weight:bold">(</span>Project<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">608</span> tokens
</span></span><span style="display:flex;"><span>└ regulatory-510k-consultant <span style="color:#89dceb;font-weight:bold">(</span>Project<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">459</span> tokens
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>Memory files · /memory
</span></span><span style="display:flex;"><span>└ User <span style="color:#89dceb;font-weight:bold">(</span>/home/dgalarza/.claude/CLAUDE.md<span style="color:#89dceb;font-weight:bold">)</span>: <span style="color:#fab387">10</span> tokens
</span></span><span style="display:flex;"><span>└ Project <span style="color:#89dceb;font-weight:bold">(</span>/home/dgalarza/Code/tracewell.ai/CLAUDE.md<span style="color:#89dceb;font-weight:bold">)</span>: 4.0k tokens
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>SlashCommand Tool · <span style="color:#fab387">16</span> commands
</span></span><span style="display:flex;"><span>└ Total: 2.7k tokens
</span></span></code></pre></div><p>As you can see this gives us a nice detailed view of our context window, including what percentage of the context window is currently available and a breakdown of what is taking up space. From it we can see that my MCP tools are taking up 26.5k tokens which is about 13.3% of the Claude Sonnet 4.5 context window. Aside from that, we can see that the Custom agents that are defined and available are taking up about 2.8k tokens, my project&rsquo;s CLAUDE.md is 4k tokens and about 22% of the context window is reserved for autocompacting.</p>
<h3 id="what-is-autocompacting">What is autocompacting?</h3>
<p>In order to understand autocompacting we first need to understand how a typical conversation flows within the Anthropic API. By default, every call to the Anthropic Claude API has no recollection of previous parts of a conversation. Instead as the consumer of the API we need maintain that conversation history and provide it to the API. Take a look at the following diagram:</p>
<p><img src="/images/posts/claude-conversation.png" alt="Diagram illustrating how Claude Code maintains conversation history across API requests, showing message flow and context accumulation"></p>
<p>You can see that the first request kicks off the conversation with &ldquo;Add error handling to the auth module&rdquo;. From there we get a response back from the LLM with the result of what it did. When the user then continues the conversation in request 2, we can see they say &ldquo;Now add tests for those changes&rdquo;. You can see here though that we actually end up sending the full conversation history, with our first message, along with the response from the LLM and now our new message. This is a simplified example which doesn&rsquo;t include tool calling. Any tool call requests would also be in this history as well as the results from tool calling. As your conversation gets longer and longer, more and more of the context window is being taken up. As you approach the limit of the context window space must be freed up. One way to do this is compaction.</p>
<p>Compacting the context window is a context engineering technique to compress a long running conversation or session by summarizing the conversation in order to free up space. This summarization is typically handled by an LLM. This generated summary then becomes the basis of the remainder of the conversation. Compacting the conversation history can sometimes work well. However it&rsquo;s not an exact science and you are beholden to the LLM to identify the correct things to include in the summary. If you&rsquo;ve ever had a long running conversation with Claude Code and started feeling like things have started to go off the rails, you might have experienced this. In the long conversation you might end up with multiple autocompact calls where now the LLM is summarizing a summarization along with the rest of the conversation.</p>
<p><strong>Warning signs of problematic autocompact:</strong></p>
<ul>
<li>Claude forgets decisions you made earlier in the conversation</li>
<li>Claude repeats work it already completed</li>
<li>Claude asks questions you already answered</li>
<li>Solutions start contradicting earlier approaches</li>
</ul>
<p>When you notice these symptoms, it&rsquo;s usually time for a <code>/clear</code> and a fresh start rather than continuing to fight against a degraded context.</p>
<h2 id="managing-your-context-window">Managing Your Context Window</h2>
<p>Now that we understand what the context window is, how a conversation&rsquo;s history occurs and how it impacts the context window let&rsquo;s explore different ways to manage the context window to make the most out of it.</p>
<h3 id="delegating-to-subagents">Delegating to Subagents</h3>
<p>Claude Code has the ability to spin off &ldquo;subagents&rdquo; when it&rsquo;s working. These subagents each have their own context window which is separate from the main conversation. This gives us two advantages. First, the subagent&rsquo;s context window isn&rsquo;t cluttered with our previous conversation history. Second, and this is the flip side, our main conversation isn&rsquo;t cluttered with all the details of whatever the subagent was instructed to work on. Instead, it reports back its results. You can see this in action with Claude Opus 4.5 whenever you plan something. It typically delegates its tasks out to subagents to help aid with the plan.</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>⏺ <span style="color:#fab387">3</span> Explore agents finished <span style="color:#89dceb;font-weight:bold">(</span>ctrl+o to expand<span style="color:#89dceb;font-weight:bold">)</span>
</span></span><span style="display:flex;"><span>   ├─ Explore Tracewell Agent workflow · <span style="color:#fab387">23</span> tool uses · 104.5k tokens
</span></span><span style="display:flex;"><span>   │  ⎿ Done
</span></span><span style="display:flex;"><span>   ├─ Explore Tracewell DHF extractions · <span style="color:#fab387">28</span> tool uses · 108.0k tokens
</span></span><span style="display:flex;"><span>   │  ⎿ Done
</span></span><span style="display:flex;"><span>   └─ Explore <span style="color:#89dceb">eval</span> framework · <span style="color:#fab387">24</span> tool uses · 101.8k tokens
</span></span><span style="display:flex;"><span>      ⎿ Done
</span></span></code></pre></div><p>You can also instruct Claude to invoke a subagent explicitly. Some examples:</p>
<ul>
<li>&ldquo;Have a subagent do a code review of this branch against main&rdquo;</li>
<li>&ldquo;Use a subagent to explore how authentication works in this codebase&rdquo;</li>
<li>&ldquo;Spawn a subagent to research different caching strategies for this use case&rdquo;</li>
</ul>
<p>When the subagent completes, you&rsquo;ll see a summary like this in your main conversation:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>⏺ Task agent finished · <span style="color:#fab387">15</span> tool uses · 52.3k tokens
</span></span><span style="display:flex;"><span>  ⎿ The code review found <span style="color:#fab387">3</span> issues: <span style="color:#89dceb;font-weight:bold">[</span>summary of findings...<span style="color:#89dceb;font-weight:bold">]</span>
</span></span></code></pre></div><p>Notice that the subagent used 52k tokens of its own context, but your main conversation only receives the summary. This is the key benefit: the detailed work happens in isolation.</p>
<p>I&rsquo;ve found subagents work best for self-contained tasks that require reading lots of files. Code reviews are a natural fit. The subagent can dig through diffs without polluting your main context. The same goes for codebase exploration when you&rsquo;re trying to understand how an unfamiliar feature works across multiple modules. Research tasks also work well here; you can have a subagent investigate implementation options and report back before you commit to an approach.</p>
<h3 id="using-custom-agents">Using Custom Agents</h3>
<p>Custom agents take subagents to another level. They allow us to define a custom agent with a persona and expertise area which makes use of the same functionality of subagents where they have their own context window. Additionally we can also define what tools it has access to. This is useful if you are defining an agent that you know doesn&rsquo;t need specific tools so their tool definitions don&rsquo;t need to take space in the context window.</p>
<p>An agent is a markdown file which lives in either <code>~/.claude/agents</code> or <code>.claude/agents</code>. You can provide it a name, a description, a model, and tools which it is allowed to use. This is all handled via YAML frontmatter. After the frontmatter you define the agent itself.</p>
<p>Let&rsquo;s take a look at a practical example.</p>
<p>In Tracewell I have defined a few subagents that you can see in the earlier <code>/context</code> output. The <code>rails-backend-expert</code> doesn&rsquo;t need access to the Linear MCP so I can choose not to give the agent access to it or any of its tools. This is handled by setting an allow list of what tools you want to give the model access to:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#fab387">---</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">name</span>: rails-backend-expert
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">description</span>: Use this agent when working on Ruby on Rails backend code, including models, controllers, services, jobs, database migrations, API endpoints, background processing, or any server-side Ruby logic.
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">tools</span>: Bash, Glob, Grep, Read, Edit, Write, NotebookEdit, WebFetch, TodoWrite, WebSearch, BashOutput, KillShell, AskUserQuestion, Skill, SlashCommand, mcp__memory__create_entities, mcp__memory__create_relations, mcp__memory__add_observations, mcp__memory__delete_entities, mcp__memory__delete_observations, mcp__memory__delete_relations, mcp__memory__read_graph, mcp__memory__search_nodes, mcp__memory__open_nodes, mcp__context7__resolve-library-id, mcp__context7__get-library-docs
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">model</span>: sonnet
</span></span><span style="display:flex;"><span><span style="color:#fab387">---</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>You are a Ruby on Rails backend expert. Your role is to help with...
</span></span></code></pre></div><p>The agent&rsquo;s full persona and instructions follow after the frontmatter. I recommend using the <code>/agents</code> command to get started. From there Claude will walk you through creating your first agent. When doing so Claude will ask you if you want Claude to generate or manually configure the agent. I recommend going with its recommended approach which is to have Claude create the agent. You can provide a high level prompt and it will then generate the full agent description for you. As part of the wizard for creating the agent Claude will ask you what tools you want the agent to have access to.</p>
<h3 id="claude-skills">Claude Skills</h3>
<p>In October 2025 Anthropic announced <a href="https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills">Agent Skills</a> a way of encapsulating domain expertise or workflows for agents to follow. Skills are organized in folders with a core <code>SKILL.md</code> file which has frontmatter that has required metadata such as the name and the description. The body of the <code>SKILL.md</code> contains the instruction set of the skill itself.</p>
<p>When Claude Code starts, it loads the name and description of every skill available into its context via the system prompt. This allows Claude to use progressive disclosure in determining when to use a skill without loading the entire thing into its context window. This can be a powerful tool that can even potentially replace some MCP servers. This is possible because skills can also contain scripts that Claude can run. Instead of having to always expose tools into the context window via an MCP server you can provide a skill which has scripts that it can run that only get added to the agent&rsquo;s context window when it is useful.</p>
<p>A great example of this is the <a href="https://github.com/lackeyjb/playwright-skill">Playwright Skill for Claude Code</a> by Bryan Lackey. Previously if you wanted to easily add Playwright to Claude Code for interfacing with your web application, you&rsquo;d add the <a href="https://github.com/microsoft/playwright-mcp">playwright-mcp</a>. The playwright-mcp adds 22 tools which consume about 14.3k tokens, which is 7.2% of your context window just by being available.</p>
<p>The skill, by contrast, only adds about 200 tokens at startup for its name and description. The full SKILL.md (around 4-5k tokens) only loads when you actually invoke the skill. If you use Playwright in maybe one out of every five sessions, you&rsquo;re saving roughly 10k tokens in the sessions where you don&rsquo;t need it.</p>
<h3 id="using-clear">Using clear</h3>
<p>Another tool at your disposal is to use the <code>/clear</code> command often. This command resets / empties the context window providing a fresh start. I highly recommend that you do this often especially when you have completed a distinct task and are moving onto a new one where the previous conversation history is no longer needed or useful.</p>
<h3 id="compacting-the-conversation-manually">Compacting the conversation manually</h3>
<p>Along with autocompact you can also manually choose when to compact a conversation. You can do this by running <code>/compact</code>. It takes an optional argument which is instructions on how Claude should perform the compaction. You can guide Claude to make sure that it captures certain information while generating its summary and compacting the conversation. I recommend this when you have made significant progress on your work and are moving onto some related work. Perhaps Claude broke up the work into multiple phases and you just completed phase 1. You could:</p>
<ol>
<li>Use <code>/clear</code> to reset the context window. However, if you didn&rsquo;t persist the plan / TODO list somewhere you&rsquo;ll start from scratch.</li>
<li>Continue until autocompact kicks in and you let the LLM do the heavy lifting of summarizing / compacting the conversation.</li>
</ol>
<p>Instead, I&rsquo;d recommend using the <code>/compact</code> command and instruct Claude to summarize the progress you&rsquo;ve made so far and start with a &ldquo;fresh&rdquo; context window on the next phase of work. I say &ldquo;fresh&rdquo; since we aren&rsquo;t fully clearing the context window but are compressing the previous conversation.</p>
<h3 id="being-strategic-about-file-reads">Being Strategic About File Reads</h3>
<p>It&rsquo;s easy to overlook how quickly file reads consume context. Every time Claude reads a file, that content gets added to the conversation history. Large files, broad grep results, or reading several files in sequence can eat through your available context faster than you&rsquo;d expect.</p>
<p>A few things I&rsquo;ve learned to do:</p>
<ul>
<li>When I know roughly where something is, I&rsquo;ll point Claude to specific line ranges rather than having it read entire files. For example: &ldquo;Look at the <code>authenticate</code> method in <code>app/services/auth_service.rb</code> around lines 45-80&rdquo; instead of just &ldquo;check the auth service&rdquo;</li>
<li>I try to use targeted grep patterns before asking Claude to read files. Narrowing down candidates first means fewer files loaded into context. For example, instead of &ldquo;find where we handle webhook failures&rdquo;, I might say &ldquo;grep for <code>webhook.*fail</code> or <code>handle.*webhook</code> in app/services/ and show me the matches before reading any files.&rdquo; This way Claude identifies the 2-3 relevant files first rather than speculatively reading 10 service files looking for the right one.</li>
<li>For orientation questions like &ldquo;what does this module do?&rdquo;, asking Claude to summarize rather than read the whole thing can save significant tokens</li>
</ul>
<p>This becomes especially important in larger codebases where a single exploration session can involve dozens of file reads.</p>
<h3 id="optimizing-your-claudemd">Optimizing Your CLAUDE.md</h3>
<p>Your project&rsquo;s <code>CLAUDE.md</code> file loads into every conversation, so it&rsquo;s worth keeping it lean. Looking back at my <code>/context</code> output, my project&rsquo;s CLAUDE.md takes up 4k tokens, which is 2% of my context window before I&rsquo;ve even started working.</p>
<p>A few things to keep in mind:</p>
<ul>
<li>Bullet points tend to be more token-efficient than prose</li>
<li>Put the most critical instructions at the beginning since Claude pays more attention to the start and end of content (that &ldquo;lost in the middle&rdquo; problem again)</li>
<li>Consider whether instructions belong at the project level or could live in your user-level <code>~/.claude/CLAUDE.md</code> instead</li>
<li>Periodically audit for outdated instructions that no longer apply</li>
</ul>
<p>It&rsquo;s a balancing act. You want enough context for Claude to understand your project&rsquo;s conventions, but not so much that you&rsquo;re burning tokens on rarely-relevant details.</p>
<h2 id="best-practices-for-context-window-management">Best Practices for Context Window Management</h2>
<ol>
<li><strong>Monitor regularly</strong> - Run <code>/context</code> at the start of each session to understand your baseline usage</li>
<li><strong>Audit your MCP servers</strong> - Remove any MCP servers you haven&rsquo;t used recently; each one consumes tokens just by existing</li>
<li><strong>Prefer skills over MCP servers</strong> - When building new functionality, consider skills first for better context efficiency through progressive disclosure</li>
<li><strong>Clear between tasks</strong> - Use <code>/clear</code> liberally when switching between unrelated work</li>
<li><strong>Strategic compacting</strong> - Use <code>/compact</code> with custom instructions when transitioning between related phases of work</li>
<li><strong>Delegate complex work</strong> - Use subagents for self-contained tasks to keep their context isolated from your main conversation</li>
</ol>
<blockquote>
<p><strong>Running a team on Claude Code?</strong> Context window management gets
harder with 5+ engineers making different choices about MCP servers,
CLAUDE.md conventions, and workflow patterns. A
<a href="/services/#retainer">Production AI Retainer</a>
standardizes this across your team.</p></blockquote>
<h2 id="conclusion">Conclusion</h2>
<p>Context management isn&rsquo;t just about avoiding limits; it&rsquo;s about keeping your conversations focused and effective. A cluttered context window leads to degraded responses, just like a cluttered desk makes it harder to find what you need.</p>
<p>The key takeaways: monitor your usage with <code>/context</code>, delegate to subagents for isolated work, and use <code>/clear</code> liberally between tasks. When possible, prefer skills over MCP servers for better context efficiency through progressive disclosure.</p>
<p>Start by running <code>/context</code> in your next Claude Code session to see where your tokens are going. You might be surprised by what you find.</p>
<p><strong>Update:</strong> With Claude Opus 4.7&rsquo;s 1M token context window, the fundamentals here still apply but the ceiling and controls have changed. See <a href="/posts/2026-04-30-claude-opus-4-7-claude-code-tips-extended-context/">7 Practical Tips for Maximizing Extended Context</a> for workflow adjustments specific to the larger window.</p>
<blockquote>
<p>If this post was the explanation, the cheat sheet is the reference.
Two sides: token costs for common MCPs on one, the <code>/clear</code> /
<code>/compact</code> / subagent decision tree on the other.</p>
<p><a href="/context-window-cheat-sheet/">Get the Context Window Cheat Sheet →</a></p></blockquote>
]]></content:encoded></item><item><title>How I Use Claude Code: My Complete Development Workflow</title><link>https://www.damiangalarza.com/posts/2025-11-25-how-i-use-claude-code/</link><pubDate>Tue, 25 Nov 2025 00:00:00 -0500</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/posts/2025-11-25-how-i-use-claude-code/</guid><description>After 8 months with Claude Code, here's my complete workflow. Learn how I combine Linear, MCP servers, and Obsidian for AI-assisted development that works.</description><content:encoded><![CDATA[<p>For the past 8 months I&rsquo;ve been using Claude Code as my daily driver. It&rsquo;s become a core part of my development workflow. Before this I tried Cursor for a few months and enjoyed it. However, I&rsquo;ve been a terminal/Vim user for quite a while, so moving to an IDE was a significant change. I found myself drawn to Claude Code&rsquo;s agentic workflow rather than autocomplete or chat panels.</p>
<p>During this time my workflow has evolved significantly. This is partly from learning how to get the most out of it, but also from the Anthropic team&rsquo;s continuous improvements to the product. New features are coming out at a rapid pace.</p>
<p>There&rsquo;s an ongoing debate in our industry. Some developers swear by AI assistants while others remain skeptical. You hear stories about companies claiming developers are no longer needed, alongside dismissals that AI-generated code is always garbage. I find myself somewhere in the middle.</p>
<p>I believe using these tools is a skill in and of itself. When people tell me &ldquo;it takes longer to get the LLM to do it right&rdquo; or &ldquo;I can do it faster myself,&rdquo; I understand where they&rsquo;re coming from. When I first started programming, I was slow too. But I got faster with practice. The same applies to working with agentic development tools.</p>
<p>This post walks through my typical Claude Code workflow. While I focus on Claude Code specifically, these concepts apply to many agentic coding tools.</p>
<h2 id="context-is-king">Context is King</h2>
<p>One of the biggest complaints I hear from developers goes something like this: &ldquo;I tried using an AI assistant but it just wouldn&rsquo;t get it right. I spent so much time trying to get it to do what I wanted and eventually gave up.&rdquo;</p>
<p>I&rsquo;ve written about <a href="/posts/2025-11-06-build-efficient-mcp-servers-three-design-principles/">how MCP tools consume the context window</a> before. But context matters in other ways too.</p>
<p>When I dig deeper into these frustrations, I typically ask how they prompted the LLM. The answer is usually a fairly vague prompt. In a smaller codebase this might work fine, but in an established codebase it often falls short. We need to give the LLM a well-structured problem.</p>
<p>One thing I&rsquo;ve learned is that developers who have experience managing or delegating tasks tend to adapt quickly. They already understand how to break down problems into small pieces for someone else to work on. This is why I spend time breaking down problems into bite-sized chunks—a common practice in agile development.</p>
<p>For example, while building <a href="https://www.tracewell.ai">Tracewell AI</a> I work with Linear for issue tracking. Even though I&rsquo;m typically working alone, being disciplined about creating issues pays off. I often use Claude via the desktop app or terminal to scope out work, break down problems, and create Linear issues. This upfront work makes implementation much smoother.</p>
<h2 id="tools-i-use">Tools I Use</h2>
<p>My Claude Code setup relies on a few key tools that work together to provide rich context. Each addresses a different aspect of development—project management, error tracking, version control, and memory—creating a network of information that Claude can draw from when planning and implementing features.</p>
<h3 id="linear-mcp-server">Linear MCP Server</h3>
<p>The <a href="https://linear.app/docs/mcp">Linear MCP Server</a> is a backbone of my workflow. It gives Claude direct access to project issues, enabling both the creation of backlog items and the delegation of implementation tasks.</p>
<h3 id="sentry-mcp-server">Sentry MCP Server</h3>
<p>I use Sentry for error tracking, so the <a href="https://docs.sentry.io/product/sentry-mcp/">Sentry MCP</a> is a natural addition. It allows me to point Claude at an exception for triaging or fixing. While I have Sentry connected to Linear for automatic issue creation, the MCP integration adds another layer of context when investigating errors. If you want to see this workflow in action, check out my video on <a href="https://youtu.be/GfDczm2xJ1M">Debugging Production Issues with AI</a>.</p>
<h3 id="github-cli">GitHub CLI</h3>
<p>This one is critical. If you&rsquo;ve read the <a href="https://www.anthropic.com/engineering/claude-code-best-practices">Claude Code Best Practices</a> you&rsquo;ve likely seen the recommendation to install the <a href="https://cli.github.com/">GitHub CLI</a> (<code>gh</code>). If you haven&rsquo;t read that guide, I highly recommend starting there.</p>
<p>Claude Code excels at using the GitHub CLI for tasks like:</p>
<ol>
<li>Opening pull requests</li>
<li>Investigating GitHub issues</li>
<li>Debugging GitHub Action runs</li>
<li>Reviewing PR feedback</li>
<li>Performing code review on others&rsquo; pull requests</li>
</ol>
<h3 id="memory-mcp-server">Memory MCP Server</h3>
<p>The <a href="https://github.com/modelcontextprotocol/servers/tree/main/src/memory">Memory MCP Server</a> provides Claude with persistent memory across conversations. In my workflow, I use it to store implementation plans so Claude can track progress and maintain context throughout a feature&rsquo;s development. When Claude creates a plan for a Linear issue, it saves it to the memory graph. This becomes especially useful when work spans multiple sessions.</p>
<p>With these tools in place, let&rsquo;s look at another core part of my workflow.</p>
<h2 id="obsidian-notes">Obsidian Notes</h2>
<p>I&rsquo;ve been using Obsidian for notes for over a year, but it never occurred to me to connect it to Claude Code until I heard the Every podcast episode with Noah Brier: <a href="https://every.to/podcast/how-to-use-claude-code-as-a-thinking-partner">How to Use Claude Code as a Second Brain</a>. This significantly changed how I provide context to my development work.</p>
<p>Why is this connection so important? When I&rsquo;m working on a project, I&rsquo;m taking notes. At a project kick-off I&rsquo;m capturing potential solutions, key pieces of code, and product knowledge. These notes go into my vault under paths like <code>01-Projects/DHF Extraction/2025-11-01-Pairing Session.md</code>. Meeting transcripts end up in the same project folder.</p>
<p>When it&rsquo;s time to implement, I put Claude in plan mode and instruct it to &ldquo;Review my notes in 01-Projects/DHF Extraction and help me implement X.&rdquo; Claude can now gather all the context I&rsquo;ve assembled to inform its implementation plan.</p>
<blockquote>
<p>If you want to learn more about how I process meeting notes in Obsidian, check out my <a href="https://github.com/dgalarza/claude-code-workflows/tree/main/.claude/skills/process-meeting-transcript">Process Meeting Transcript Skill</a> on GitHub.</p></blockquote>
<p>To make the most of this, use the <code>/add-dir</code> command to add your Obsidian vault path. This allows Claude Code to reference files in your vault without permission issues.</p>
<h2 id="putting-it-all-together">Putting It All Together</h2>
<p>With MCP servers handling project management, error tracking, and memory, plus Obsidian providing my accumulated notes and research, I have all the pieces needed for a comprehensive workflow. All of these tools come together in a <a href="https://www.claude.com/blog/skills">Claude Agent Skill</a> that takes a Linear issue by ID and implements a solution. Let me break down this skill.</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-markdown" data-lang="markdown"><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"># Overview
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span>
</span></span><span style="display:flex;"><span>This skill provides a comprehensive workflow for implementing Linear issues with professional software engineering practices. It automates the entire development lifecycle from issue analysis through PR creation, ensuring quality through test-driven development, parallel code reviews, and systematic validation.
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold">## When to Use This Skill
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span>
</span></span><span style="display:flex;"><span>Use this skill when:
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> User provides a Linear issue ID (format: <span style="color:#a6e3a1">`TRA-9`</span>, <span style="color:#a6e3a1">`DEV-123`</span>, etc.)
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> User requests implementation of a Linear issue
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> User wants a structured TDD approach with code review
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> User needs automated workflow from issue to PR
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>Examples:
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> &#34;Implement TRA-142&#34;
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> &#34;Help me build the feature in DEV-89&#34;
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> &#34;Work on Linear issue ABC-456&#34;
</span></span></code></pre></div><p>This sets the stage for what the skill does and when to invoke it. Now let&rsquo;s look at the core workflow.</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-markdown" data-lang="markdown"><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"># Core Workflow
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span>
</span></span><span style="display:flex;"><span>The skill follows a 14-step process:
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">1.</span> <span style="font-weight:bold">**Fetch Linear Issue**</span> - Retrieve complete issue details via Linear MCP
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">2.</span> <span style="font-weight:bold">**Gather Additional Context**</span> - Search Obsidian, Sentry, and GitHub for related information
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">3.</span> <span style="font-weight:bold">**Move to In Progress**</span> - Update issue status to indicate active work
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">4.</span> <span style="font-weight:bold">**Create Feature Branch**</span> - Use Linear&#39;s suggested git branch naming
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">5.</span> <span style="font-weight:bold">**Analyze &amp; Plan**</span> - Break down requirements and create implementation plan
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">6.</span> <span style="font-weight:bold">**Save to Memory**</span> - Store plan in memory graph for tracking
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">7.</span> <span style="font-weight:bold">**Review Plan**</span> - Present plan for user confirmation
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">8.</span> <span style="font-weight:bold">**TDD Implementation**</span> - Invoke <span style="color:#a6e3a1">`tdd-workflow`</span> skill for test-driven development
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">9.</span> <span style="font-weight:bold">**Parallel Code Reviews**</span> - Invoke <span style="color:#a6e3a1">`parallel-code-review`</span> skill for comprehensive analysis
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">10.</span> <span style="font-weight:bold">**Address Feedback**</span> - Invoke <span style="color:#a6e3a1">`code-review-implementer`</span> skill to systematically fix issues
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">11.</span> <span style="font-weight:bold">**Validation**</span> - Ensure all tests and linters pass
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">12.</span> <span style="font-weight:bold">**Logical Commits**</span> - Create meaningful commit history
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">13.</span> <span style="font-weight:bold">**Create PR**</span> - Generate comprehensive pull request with Linear linking
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">14.</span> <span style="font-weight:bold">**Final Verification**</span> - Confirm CI/CD pipeline and Linear integration
</span></span></code></pre></div><p>There&rsquo;s a lot happening here, but the goal is straightforward: build as much context as possible before implementation begins. The workflow pulls in details from the Linear issue, related Obsidian notes, Sentry exceptions if relevant, and any linked GitHub discussions. For example, a Linear issue might have been extracted from a previous pull request discussion as a follow-up task—pulling that context in gives Claude a much better starting point.</p>
<p>One thing worth highlighting: step 7 (Review Plan) is a key part of this workflow. After gathering context and creating a plan, Claude presents it and <em>waits for my approval</em> before writing any code. This human-in-the-loop checkpoint prevents runaway implementations and gives me a chance to course-correct before significant work begins.</p>
<p>You&rsquo;ll notice a few other skills referenced in the workflow. These are also available in the <a href="https://github.com/dgalarza/claude-code-workflows/tree/main/.claude/skills">claude-code-workflows</a> repo:</p>
<p><strong>tdd-workflow skill</strong>
A skill that outlines a test-driven development workflow following an outside-in testing approach.</p>
<p><strong>parallel-code-review</strong>
This workflow spins off two Claude sub-agents to perform code review in parallel. One focuses on Rails and object-oriented best practices while the other performs security analysis.</p>
<p><strong>code-review-implementer</strong>
A skill that ranks code review feedback by priority and systematically addresses it. High priority feedback is always addressed. Medium and low priority items are presented for my decision before implementation.</p>
<h2 id="getting-started">Getting Started</h2>
<p>If you want to try this workflow yourself, here&rsquo;s how to get started:</p>
<ol>
<li>
<p><strong>Install the MCP servers</strong> - Set up Linear, Sentry (if you use it), and Memory MCP servers in your Claude Code configuration.</p>
</li>
<li>
<p><strong>Copy the skills</strong> - Clone or copy the skills from my <a href="https://github.com/dgalarza/claude-code-workflows">claude-code-workflows</a> repo into your project&rsquo;s <code>.claude/skills/</code> directory. You&rsquo;ll need <code>linear-implement</code> and its dependencies (<code>tdd-workflow</code>, <code>parallel-code-review</code>, <code>code-review-implementer</code>).</p>
</li>
<li>
<p><strong>Customize for your stack</strong> - My skills are tailored to Rails projects with specific conventions (POODR principles, Result pattern, RSpec). If you&rsquo;re using Django, Node, Go, or another stack, you&rsquo;ll want to adapt the code review criteria and testing workflows to match your conventions.</p>
</li>
<li>
<p><strong>Connect your notes</strong> - Use <code>/add-dir</code> to add your Obsidian vault (or wherever you keep project notes) so Claude can reference them.</p>
</li>
<li>
<p><strong>Try it out</strong> - Once everything is set up, just type &ldquo;Implement TRA-142&rdquo; (substituting your issue ID) and the workflow kicks off automatically.</p>
</li>
</ol>
<p>Claude Code auto-discovers skills in the <code>.claude/skills/</code> directory, so there&rsquo;s no additional configuration needed beyond placing the files.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Getting value from agentic development tools requires building the right habits. By investing time upfront in breaking down problems, maintaining good notes, and connecting your tools together, you can create workflows that dramatically improve your productivity.</p>
<p>The key insight is that context matters. The more relevant information you can surface for the LLM, the better its output will be. This is true whether you&rsquo;re using Claude Code, Cursor, or any other AI-assisted development tool.</p>
<p>If you&rsquo;re interested in the full skill, you can find it in my <a href="https://github.com/dgalarza/claude-code-workflows/tree/main/.claude/skills/linear-implement">claude-code-workflows</a> repo on GitHub.</p>
<h2 id="further-reading">Further Reading</h2>
<ul>
<li><a href="https://youtu.be/Q7YR5-KtgJU">Getting Started with Claude Code</a> - Video walkthrough for those new to Claude Code</li>
<li><a href="https://www.anthropic.com/engineering/claude-code-best-practices">Claude Code: Best Practices for agentic coding</a></li>
<li><a href="https://every.to/podcast/how-to-use-claude-code-as-a-thinking-partner">How to Use Claude Code as a Second Brain</a></li>
<li><a href="https://www.youtube.com/watch?v=qizQkByZ4WM&amp;t=1246s">Forward Deployed, Episode 2: Claude Code Skills and the Progressive Disclosure Problem</a></li>
<li><a href="https://www.youtube.com/watch?v=-uW5-TaVXu4">Most devs don&rsquo;t understand how context windows work</a></li>
</ul>
<hr>
<p>If you&rsquo;re looking to build a workflow like this for your team, I offer coaching and workshops on Claude Code. <a href="/claude-code/">See how I can help</a>.</p>
]]></content:encoded></item><item><title>Build Efficient MCP Servers: Three Design Principles</title><link>https://www.damiangalarza.com/posts/2025-11-06-build-efficient-mcp-servers-three-design-principles/</link><pubDate>Thu, 06 Nov 2025 00:00:00 -0500</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/posts/2025-11-06-build-efficient-mcp-servers-three-design-principles/</guid><description>Three design principles for context-efficient MCP servers: filter at source, pre-aggregate data, work creatively. Real reductions: 746k→262 tokens.</description><content:encoded><![CDATA[<p>Recently I had an idea. What would it be like to interact with my YNAB budget via Claude Code using natural language? I wanted to be able to ask questions like &ldquo;How much did I spend on groceries last month?&rdquo; or &ldquo;What categories am I overspending in?&rdquo; and get accurate answers without digging through the app.</p>
<p>I found some existing YNAB MCPs, but most were inactive with limited features. This seemed like a good opportunity to learn MCP design from scratch. What followed was a deep dive into context efficiency that changed how I think about building AI tools.</p>
<h2 id="understanding-model-context-protocols-mcps">Understanding Model Context Protocols (MCPs)</h2>
<p>The Model Context Protocol (MCP) is a standardized way to extend language models with external capabilities. Unlike traditional APIs where you write code to call endpoints, MCP servers allow AI models to discover and use tools autonomously. The protocol defines how models can:</p>
<ul>
<li><strong>Call tools</strong> - Execute functions that interact with external systems</li>
<li><strong>Read resources</strong> - Access files, databases, or other data sources</li>
<li><strong>Receive prompts</strong> - Get specialized instructions for specific tasks</li>
</ul>
<p>When you build an MCP server, you&rsquo;re essentially creating a set of capabilities that any MCP-compatible AI assistant (like Claude) can use. The model sees your tool descriptions, understands what they do, and calls them as needed to fulfill user requests.</p>
<h2 id="understanding-context-windows">Understanding Context Windows</h2>
<p>Before we dive into how to make our MCPs more efficient, it&rsquo;s important to understand what we&rsquo;re trying to optimize. When working with LLMs, the context window is the amount of content that the model can &ldquo;pay attention&rdquo; to at one time. Each model has a limit to the size of its context window. For example, Claude Sonnet 4.5&rsquo;s context window is about 200,000 tokens.</p>
<h3 id="what-is-a-token">What is a Token?</h3>
<p>When you send text to an LLM, it doesn&rsquo;t process words one at a time. Instead, text is broken into <strong>tokens</strong>—the fundamental units that language models read and generate. A token typically represents 3-4 characters, or roughly 0.75 words in English.</p>
<p>For API responses and JSON data (which is what MCPs work with), tokenization looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{<span style="color:#cba6f7">&#34;name&#34;</span>: <span style="color:#a6e3a1">&#34;Checking&#34;</span>}           <span style="color:#6c7086;font-style:italic">// ~7 tokens
</span></span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"></span><span style="color:#a6e3a1">&#34;transfer_payee_id&#34;</span>            <span style="color:#6c7086;font-style:italic">// ~5 tokens
</span></span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"></span>{<span style="color:#cba6f7">&#34;balance&#34;</span>: <span style="color:#fab387">125000</span>}            <span style="color:#6c7086;font-style:italic">// ~6 tokens
</span></span></span></code></pre></div><p>The tokenizer breaks JSON into chunks: brackets, keys, values, and punctuation all consume tokens. Every field name has a cost. Field names like <code>&quot;debt_escrow_amounts&quot;</code> and <code>&quot;direct_import_in_error&quot;</code> cost ~4-6 tokens each. When you return an API response with 18 fields per object, you&rsquo;re paying the token cost for every field name, every time—even when the model doesn&rsquo;t need them.</p>
<h3 id="why-token-efficiency-matters">Why Token Efficiency Matters</h3>
<p>Given the limited size of the context window, it&rsquo;s critical to consider how much you&rsquo;re placing inside it. Models work best when they have good context, but there&rsquo;s a balance to strike:</p>
<ul>
<li>
<p><strong>Context limits are hard boundaries</strong>: Claude Sonnet 4.5&rsquo;s 200k token limit sounds generous until you realize a naive MCP returning a year of transactions can consume 746,800 tokens—nearly 4x the entire context window. Your tool call would fail before the model could even process it.</p>
</li>
<li>
<p><strong>Real sessions are already crowded</strong>: In typical Claude Code sessions, MCP tool definitions, system prompts, and memory can consume 50-60% of the context window before you&rsquo;ve had a single conversation. Every inefficient tool response eats into precious space needed for reasoning and multi-turn conversations.</p>
</li>
<li>
<p><strong>Noise degrades performance</strong>: Providing superfluous data doesn&rsquo;t just waste tokens—it forces the model to parse irrelevant fields, increasing the chance of errors or confusion. A focused 262-token summary outperforms a noisy 4,890-token dump of raw data.</p>
</li>
</ul>
<p>The goal isn&rsquo;t to minimize tokens at all costs. It&rsquo;s to give the model exactly what it needs, nothing more, nothing less. Let&rsquo;s look at how this plays out in practice.</p>
<h3 id="the-naive-approach-direct-api-wrapping">The Naive Approach: Direct API Wrapping</h3>
<p>When I started building the YNAB MCP, I didn&rsquo;t yet appreciate how much token efficiency would matter. I started with what seemed obvious: create a thin wrapper around the existing YNAB Python SDK. Each MCP tool would correspond to one API endpoint, passing through the full response. This is a common pattern I&rsquo;ve seen in many MCP implementations.</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Naive approach: Just wrap the SDK</span>
</span></span><span style="display:flex;"><span><span style="color:#89b4fa;font-weight:bold">@mcp.tool</span>()
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">async</span> <span style="color:#cba6f7">def</span> <span style="color:#89b4fa">get_budget</span>(budget_id: <span style="color:#89dceb">str</span>) <span style="color:#89dceb;font-weight:bold">-&gt;</span> <span style="color:#89dceb">str</span>:
</span></span><span style="display:flex;"><span>    response <span style="color:#89dceb;font-weight:bold">=</span> ynab_client<span style="color:#89dceb;font-weight:bold">.</span>budgets<span style="color:#89dceb;font-weight:bold">.</span>get_budget(budget_id)
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">return</span> json<span style="color:#89dceb;font-weight:bold">.</span>dumps(response<span style="color:#89dceb;font-weight:bold">.</span>data<span style="color:#89dceb;font-weight:bold">.</span>budget)  <span style="color:#6c7086;font-style:italic"># Return everything</span>
</span></span></code></pre></div><p>This approach works, technically. The model gets access to the data. But there&rsquo;s a critical problem: <strong>API responses are designed for applications, not for AI context windows.</strong></p>
<p>Traditional applications can process, filter, and cache data efficiently, so APIs return comprehensive data structures optimized for completeness, not token efficiency. A single endpoint might return thousands of fields because the API designers don&rsquo;t know which specific fields your application needs.</p>
<p>AI models work differently. Every byte consumes precious context window space—space you could use for reasoning, conversation history, or additional tool calls. When you blindly pass through full API responses, you&rsquo;re asking the model to pay the &ldquo;context tax&rdquo; for data it might not even need. Worse, the model has to analyze and determine which parts of that data are actually relevant—a cognitive load that can lead to errors or missed information.</p>
<p>To put this in perspective: tool results compete with everything else for context space. In a real Claude Code session (visible via <code>/context</code>), I saw the context window at 118k/<a href="https://docs.anthropic.com/en/docs/about-claude/models">200k tokens</a> (59%)—before I&rsquo;d even started a conversation. MCP tool definitions alone consumed 47.9k tokens (24%), system tools used 17.3k tokens (9%), custom agents took 2.4k tokens, and memory files added another 2.3k tokens. That&rsquo;s 59% of the context window used just by the environment.</p>
<p>A naive MCP that returns 30k tokens for a budget overview would push that to 74% in a single tool call—leaving just 52k tokens for the actual conversation, reasoning, and additional tool calls. Every inefficient tool response eats into the space you need for multi-turn conversations.</p>
<p>This changed how I approached the design: MCPs need to be context-aware intermediaries, not transparent proxies. The question shifted from &ldquo;How do I expose this API to the model?&rdquo; to &ldquo;What does the model actually need to help the user?&rdquo;</p>
<h2 id="three-design-principles-for-context-efficient-mcps">Three Design Principles for Context-Efficient MCPs</h2>
<p>One of the first things I wanted to do was check my budget overview - see my accounts, categories, and how I&rsquo;m tracking for the current month. A straightforward use case that any budgeting tool should support.</p>
<p>My initial thought was to create tools that directly wrapped the YNAB API endpoints. Let&rsquo;s take the accounts endpoint as an example. Here&rsquo;s what the API returns:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#cba6f7">&#34;data&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">&#34;accounts&#34;</span>: [
</span></span><span style="display:flex;"><span>      {
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;id&#34;</span>: <span style="color:#a6e3a1">&#34;3fa85f64-5717-4562-b3fc-2c963f66afa6&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;name&#34;</span>: <span style="color:#a6e3a1">&#34;Checking Account&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;type&#34;</span>: <span style="color:#a6e3a1">&#34;checking&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;on_budget&#34;</span>: <span style="color:#fab387">true</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;closed&#34;</span>: <span style="color:#fab387">false</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;note&#34;</span>: <span style="color:#a6e3a1">&#34;Primary checking&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;balance&#34;</span>: <span style="color:#fab387">125000</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;cleared_balance&#34;</span>: <span style="color:#fab387">120000</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;uncleared_balance&#34;</span>: <span style="color:#fab387">5000</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;transfer_payee_id&#34;</span>: <span style="color:#a6e3a1">&#34;...&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;direct_import_linked&#34;</span>: <span style="color:#fab387">true</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;direct_import_in_error&#34;</span>: <span style="color:#fab387">false</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;last_reconciled_at&#34;</span>: <span style="color:#a6e3a1">&#34;2025-11-05T18:27:20.140Z&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;debt_original_balance&#34;</span>: <span style="color:#fab387">0</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;debt_interest_rates&#34;</span>: {},
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;debt_minimum_payments&#34;</span>: {},
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;debt_escrow_amounts&#34;</span>: {},
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;deleted&#34;</span>: <span style="color:#fab387">false</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>      <span style="color:#6c7086;font-style:italic">// ... 46 more accounts
</span></span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"></span>    ]
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>For <strong>my 47 accounts</strong>, this API response contains <strong>18 fields per account</strong>. Many of these fields are irrelevant for typical budget questions:</p>
<ul>
<li>Debt interest rates and minimum payments (only relevant for debt accounts)</li>
<li>Direct import status (internal system state)</li>
<li>Cleared vs uncleared balance breakdown (too granular for overview)</li>
<li>Transfer payee IDs (internal references)</li>
<li>Last reconciliation date (accounting detail)</li>
</ul>
<p>When you&rsquo;re answering budget questions, these internal bookkeeping details just create noise.</p>
<p>A naive wrapper would return all 18 fields × 47 accounts = <strong>9,960 tokens</strong>.</p>
<p>But here&rsquo;s what I actually need to answer questions like &ldquo;What&rsquo;s my checking account balance?&rdquo; or &ldquo;How much do I have across all accounts?&rdquo;:</p>
<ul>
<li>Account name</li>
<li>Account type</li>
<li>Balance</li>
<li>Whether it&rsquo;s on-budget</li>
<li>Whether it&rsquo;s closed</li>
</ul>
<p>That&rsquo;s it. Just 6 fields. Here&rsquo;s the filtered implementation:</p>
<p><strong>For accounts (47 accounts):</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># src/ynab_mcp/ynab_client.py</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">async</span> <span style="color:#cba6f7">def</span> <span style="color:#89b4fa">get_accounts</span>(<span style="color:#89dceb">self</span>, budget_id: <span style="color:#89dceb">str</span>) <span style="color:#89dceb;font-weight:bold">-&gt;</span> <span style="color:#89dceb">list</span>[<span style="color:#89dceb">dict</span>[<span style="color:#89dceb">str</span>, Any]]:
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;&#34;&#34;Get all accounts for a budget.&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    response <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#89dceb">self</span><span style="color:#89dceb;font-weight:bold">.</span>client<span style="color:#89dceb;font-weight:bold">.</span>accounts<span style="color:#89dceb;font-weight:bold">.</span>get_accounts(budget_id)
</span></span><span style="display:flex;"><span>    accounts <span style="color:#89dceb;font-weight:bold">=</span> []
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">for</span> account <span style="color:#89dceb;font-weight:bold">in</span> response<span style="color:#89dceb;font-weight:bold">.</span>data<span style="color:#89dceb;font-weight:bold">.</span>accounts:
</span></span><span style="display:flex;"><span>        <span style="color:#6c7086;font-style:italic"># Skip deleted accounts entirely</span>
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">if</span> account<span style="color:#89dceb;font-weight:bold">.</span>deleted:
</span></span><span style="display:flex;"><span>            <span style="color:#cba6f7">continue</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6c7086;font-style:italic"># Return only the fields the model actually needs</span>
</span></span><span style="display:flex;"><span>        accounts<span style="color:#89dceb;font-weight:bold">.</span>append({
</span></span><span style="display:flex;"><span>            <span style="color:#a6e3a1">&#34;id&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>id,
</span></span><span style="display:flex;"><span>            <span style="color:#a6e3a1">&#34;name&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>name,
</span></span><span style="display:flex;"><span>            <span style="color:#a6e3a1">&#34;type&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>type,
</span></span><span style="display:flex;"><span>            <span style="color:#a6e3a1">&#34;on_budget&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>on_budget,
</span></span><span style="display:flex;"><span>            <span style="color:#a6e3a1">&#34;closed&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>closed,
</span></span><span style="display:flex;"><span>            <span style="color:#a6e3a1">&#34;balance&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>balance <span style="color:#89dceb;font-weight:bold">/</span> <span style="color:#fab387">1000</span> <span style="color:#cba6f7">if</span> account<span style="color:#89dceb;font-weight:bold">.</span>balance <span style="color:#cba6f7">else</span> <span style="color:#fab387">0</span>,
</span></span><span style="display:flex;"><span>        })
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">return</span> accounts
</span></span></code></pre></div><p><strong>Filtered approach (6 essential fields):</strong> 3,451 tokens for 47 accounts</p>
<p><strong>Reduction: 65.4%</strong> by removing 12 unnecessary fields that the model doesn&rsquo;t need for typical budget questions.</p>
<h3 id="the-full-budget-overview">The Full Budget Overview</h3>
<p>Of course, checking accounts is just one part of viewing your budget. A complete budget overview requires three tool calls:</p>
<p><strong>Naive approach (no filtering):</strong></p>
<ul>
<li>Accounts (all fields): 9,960 tokens</li>
<li>Categories (all, including hidden): 12,445 tokens</li>
<li>Monthly summary (estimated): ~8,000 tokens</li>
<li><strong>Total: ~30,405 tokens</strong></li>
</ul>
<p><strong>Context-efficient approach:</strong></p>
<ul>
<li>Accounts (filtered): 3,451 tokens</li>
<li>Categories (visible only): 8,620 tokens</li>
<li>Monthly budget summary: 6,808 tokens</li>
<li><strong>Total: ~18,879 tokens</strong></li>
</ul>
<p><strong>Workflow reduction: 38% fewer tokens</strong> for the same functionality - a complete picture of my budget for the current month. This leaves plenty of room in Claude&rsquo;s context window for conversation history, reasoning, and additional tool calls.</p>
<p>But it&rsquo;s not just about saving tokens. By filtering out unnecessary data, I&rsquo;m also <strong>improving model accuracy</strong>. When Claude doesn&rsquo;t see fields like <code>debt_escrow_amounts</code> or <code>direct_import_in_error</code>, it can&rsquo;t get confused by them or incorrectly incorporate them into calculations. The model focuses on exactly what matters: account names, balances, and budget status.</p>
<p>The key insight: <strong>the model doesn&rsquo;t need to see all the data to work with it effectively</strong>. In fact, it works <em>better</em> with less data. By doing the filtering in the tool layer, I kept the context window lean while maintaining full functionality and improving reliability.</p>
<p>The filtering techniques I&rsquo;d learned from accounts and categories immediately paid off when I tackled the next challenge: helping Claude categorize uncategorized transactions.</p>
<h2 id="categorizing-transactions">Categorizing Transactions</h2>
<p>One of the workflows I wanted help with was taking uncategorized transactions and suggesting categories for them. This would help me ensure that my budget was accurate and up-to-date. To do this I created a tool that would fetch all uncategorized transactions from YNAB and get a list of the categories available in the budget.</p>
<p>On the first pass I noticed something unusual. Claude was recommending categories that were hidden in my budget - old categories I no longer used but hadn&rsquo;t deleted. I quickly realized that the YNAB REST API doesn&rsquo;t provide a way to exclude hidden categories in the API call itself. That meant I had two options:</p>
<ol>
<li>Include instructions in my tool description telling Claude to ignore hidden categories</li>
<li>Filter them out in the tool code before returning data to Claude</li>
</ol>
<p>I chose option 2. Here&rsquo;s why: every instruction you add to a tool description consumes context window. More importantly, it puts the burden of filtering on the model. This means we&rsquo;re knowingly giving the model more data than it needs, forcing it to do more work and potentially allowing the model to make mistakes.</p>
<p>It&rsquo;s worth emphasizing: tool descriptions themselves consume context tokens. In my Claude Code session, MCP tool definitions consumed 47.9k tokens (24% of the context window) before any tools were even called. Every line of documentation, every parameter description, every usage instruction adds up. This creates a tension: you want clear, helpful descriptions, but verbose documentation eats into the space available for actual tool results and conversation.</p>
<p>The solution isn&rsquo;t to write minimal descriptions—clarity matters. Instead, keep descriptions focused on what the tool does and its parameters, and handle behavior rules (like &ldquo;ignore hidden items&rdquo;) in your implementation code rather than in lengthy instructions. Instead, I implemented filtering at the tool layer:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#cba6f7">def</span> <span style="color:#89b4fa">_filter_categories</span>(
</span></span><span style="display:flex;"><span>    <span style="color:#89dceb">self</span>, categories: <span style="color:#89dceb">list</span>[<span style="color:#89dceb">dict</span>[<span style="color:#89dceb">str</span>, Any]], include_hidden: <span style="color:#89dceb">bool</span> <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#fab387">False</span>
</span></span><span style="display:flex;"><span>) <span style="color:#89dceb;font-weight:bold">-&gt;</span> <span style="color:#89dceb">list</span>[<span style="color:#89dceb">dict</span>[<span style="color:#89dceb">str</span>, Any]]:
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;&#34;&#34;Filter categories to exclude hidden/deleted ones by default.&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    filtered <span style="color:#89dceb;font-weight:bold">=</span> []
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">for</span> category <span style="color:#89dceb;font-weight:bold">in</span> categories:
</span></span><span style="display:flex;"><span>        <span style="color:#6c7086;font-style:italic"># Always skip deleted categories</span>
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">if</span> category<span style="color:#89dceb;font-weight:bold">.</span>get(<span style="color:#a6e3a1">&#34;deleted&#34;</span>):
</span></span><span style="display:flex;"><span>            <span style="color:#cba6f7">continue</span>
</span></span><span style="display:flex;"><span>        <span style="color:#6c7086;font-style:italic"># Skip hidden categories unless explicitly included</span>
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">if</span> <span style="color:#89dceb;font-weight:bold">not</span> include_hidden <span style="color:#89dceb;font-weight:bold">and</span> category<span style="color:#89dceb;font-weight:bold">.</span>get(<span style="color:#a6e3a1">&#34;hidden&#34;</span>):
</span></span><span style="display:flex;"><span>            <span style="color:#cba6f7">continue</span>
</span></span><span style="display:flex;"><span>        filtered<span style="color:#89dceb;font-weight:bold">.</span>append(category)
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">return</span> filtered
</span></span></code></pre></div><p>Then I exposed this as a parameter in the tool:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#89b4fa;font-weight:bold">@mcp.tool</span>()
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">async</span> <span style="color:#cba6f7">def</span> <span style="color:#89b4fa">get_categories</span>(budget_id: <span style="color:#89dceb">str</span>, include_hidden: <span style="color:#89dceb">bool</span> <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#fab387">False</span>) <span style="color:#89dceb;font-weight:bold">-&gt;</span> <span style="color:#89dceb">str</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;&#34;&#34;Get all categories for a budget.
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    Args:
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">        budget_id: The ID of the budget (use &#39;last-used&#39; for default budget)
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">        include_hidden: Include hidden categories and groups (default: False)
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    Returns:
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">        JSON string with category groups and categories
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    &#34;&#34;&#34;</span>
</span></span></code></pre></div><p>This approach gave me the best of both worlds: by default, Claude only sees active categories, but I can still access hidden categories when needed (like when I wanted to identify old balances that needed cleanup). This saved <strong>30.7% of tokens</strong> per category list request (from 12,445 to 8,620 tokens by filtering out 69 hidden categories) while improving accuracy.</p>
<h3 id="design-principle-1-filter-at-the-source">Design Principle #1: Filter at the Source</h3>
<p>Do data filtering in your tool code rather than relying on prompt instructions. This saves tokens and prevents errors.</p>
<h2 id="historical-spending-analysis">Historical Spending Analysis</h2>
<p>Next, I wanted to be able to ask questions about my historical spending. Questions like &ldquo;How much did I spend on groceries last month?&rdquo; or &ldquo;What categories am I overspending in?&rdquo; would be really useful.</p>
<p>My first instinct was to create a tool that fetched all transactions for a date range and let Claude analyze them. But I quickly realized this approach had serious problems.</p>
<p>To illustrate, let me show you the real numbers for my 2024 transactions:</p>
<ul>
<li><strong>Total transactions in 2024:</strong> 3,456</li>
<li><strong>Fields per transaction:</strong> 14 (id, date, amount, memo, account_name, payee_name, category_name, cleared, approved, etc.)</li>
<li><strong>Average per transaction:</strong> ~216 tokens</li>
</ul>
<p>Now extrapolate this to common queries:</p>
<ul>
<li><strong>1 month</strong> of transactions (~284 txns): ~61,368 tokens</li>
<li><strong>3 months</strong> of transactions (~852 txns): ~184,106 tokens</li>
<li><strong>6 months</strong> of transactions (~1,704 txns): ~368,213 tokens</li>
<li><strong>1 year</strong> of transactions (3,456 txns): <strong>~746,800 tokens</strong></li>
</ul>
<p>The problems with this approach:</p>
<ol>
<li><strong>Token usage</strong>: A full year query would consume <strong>746,800 tokens</strong> - that&rsquo;s 3.7x larger than Claude Sonnet 4.5&rsquo;s entire 200k context window! You literally couldn&rsquo;t fit a year of transactions in a single request.</li>
<li><strong>Speed</strong>: Transferring and parsing thousands of transaction objects is slow</li>
<li><strong>Analysis burden</strong>: Claude would need to group, sum, and calculate averages on raw data</li>
<li><strong>Wasted context</strong>: Most of those 14 fields per transaction aren&rsquo;t relevant to &ldquo;how much did I spend?&rdquo;</li>
</ol>
<p>Even a modest 3-month query would consume 184k tokens - using 92% of the available context window just for raw transaction data. This leaves almost no room for Claude to maintain conversation history, reason about the results, or make additional tool calls to answer follow-up questions.</p>
<p>Instead, I realized these calculations could easily be handled in the tool layer. Here&rsquo;s what the aggregation logic looks like:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#cba6f7">async</span> <span style="color:#cba6f7">def</span> <span style="color:#89b4fa">get_category_spending_summary</span>(
</span></span><span style="display:flex;"><span>    <span style="color:#89dceb">self</span>,
</span></span><span style="display:flex;"><span>    budget_id: <span style="color:#89dceb">str</span>,
</span></span><span style="display:flex;"><span>    category_id: <span style="color:#89dceb">str</span>,
</span></span><span style="display:flex;"><span>    since_date: <span style="color:#89dceb">str</span>,
</span></span><span style="display:flex;"><span>    until_date: <span style="color:#89dceb">str</span>,
</span></span><span style="display:flex;"><span>    include_graph: <span style="color:#89dceb">bool</span> <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#fab387">True</span>,
</span></span><span style="display:flex;"><span>) <span style="color:#89dceb;font-weight:bold">-&gt;</span> <span style="color:#89dceb">dict</span>[<span style="color:#89dceb">str</span>, Any]:
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;&#34;&#34;Get spending summary for a category over a date range.&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6c7086;font-style:italic"># Fetch transactions from API</span>
</span></span><span style="display:flex;"><span>    result <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#cba6f7">await</span> <span style="color:#89dceb">self</span><span style="color:#89dceb;font-weight:bold">.</span>_make_request_with_retry(<span style="color:#a6e3a1">&#34;get&#34;</span>, url, params<span style="color:#89dceb;font-weight:bold">=</span>params)
</span></span><span style="display:flex;"><span>    txn_data <span style="color:#89dceb;font-weight:bold">=</span> result[<span style="color:#a6e3a1">&#34;data&#34;</span>][<span style="color:#a6e3a1">&#34;transactions&#34;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6c7086;font-style:italic"># Aggregate in tool layer</span>
</span></span><span style="display:flex;"><span>    total_spent <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#fab387">0</span>
</span></span><span style="display:flex;"><span>    transaction_count <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#fab387">0</span>
</span></span><span style="display:flex;"><span>    monthly_totals <span style="color:#89dceb;font-weight:bold">=</span> {}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">for</span> txn <span style="color:#89dceb;font-weight:bold">in</span> txn_data:
</span></span><span style="display:flex;"><span>        <span style="color:#6c7086;font-style:italic"># Filter by category and date range</span>
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">if</span> txn<span style="color:#89dceb;font-weight:bold">.</span>get(<span style="color:#a6e3a1">&#34;category_id&#34;</span>) <span style="color:#89dceb;font-weight:bold">!=</span> category_id:
</span></span><span style="display:flex;"><span>            <span style="color:#cba6f7">continue</span>
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">if</span> txn[<span style="color:#a6e3a1">&#34;date&#34;</span>] <span style="color:#89dceb;font-weight:bold">&gt;</span> until_date:
</span></span><span style="display:flex;"><span>            <span style="color:#cba6f7">continue</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6c7086;font-style:italic"># YNAB stores amounts in milliunits (e.g., $125.00 = 125000)</span>
</span></span><span style="display:flex;"><span>        amount <span style="color:#89dceb;font-weight:bold">=</span> txn[<span style="color:#a6e3a1">&#34;amount&#34;</span>] <span style="color:#89dceb;font-weight:bold">/</span> <span style="color:#fab387">1000</span> <span style="color:#cba6f7">if</span> txn<span style="color:#89dceb;font-weight:bold">.</span>get(<span style="color:#a6e3a1">&#34;amount&#34;</span>) <span style="color:#cba6f7">else</span> <span style="color:#fab387">0</span>
</span></span><span style="display:flex;"><span>        total_spent <span style="color:#89dceb;font-weight:bold">+=</span> amount
</span></span><span style="display:flex;"><span>        transaction_count <span style="color:#89dceb;font-weight:bold">+=</span> <span style="color:#fab387">1</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6c7086;font-style:italic"># Build monthly breakdown</span>
</span></span><span style="display:flex;"><span>        month_key <span style="color:#89dceb;font-weight:bold">=</span> txn[<span style="color:#a6e3a1">&#34;date&#34;</span>][:<span style="color:#fab387">7</span>]  <span style="color:#6c7086;font-style:italic"># YYYY-MM</span>
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">if</span> month_key <span style="color:#89dceb;font-weight:bold">not</span> <span style="color:#89dceb;font-weight:bold">in</span> monthly_totals:
</span></span><span style="display:flex;"><span>            monthly_totals[month_key] <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#fab387">0</span>
</span></span><span style="display:flex;"><span>        monthly_totals[month_key] <span style="color:#89dceb;font-weight:bold">+=</span> amount
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6c7086;font-style:italic"># Calculate average per month</span>
</span></span><span style="display:flex;"><span>    num_months <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#89dceb">len</span>(monthly_totals) <span style="color:#cba6f7">if</span> monthly_totals <span style="color:#cba6f7">else</span> <span style="color:#fab387">1</span>
</span></span><span style="display:flex;"><span>    average_per_month <span style="color:#89dceb;font-weight:bold">=</span> total_spent <span style="color:#89dceb;font-weight:bold">/</span> num_months <span style="color:#cba6f7">if</span> num_months <span style="color:#89dceb;font-weight:bold">&gt;</span> <span style="color:#fab387">0</span> <span style="color:#cba6f7">else</span> <span style="color:#fab387">0</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6c7086;font-style:italic"># Return only the summary</span>
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">return</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#a6e3a1">&#34;category_id&#34;</span>: category_id,
</span></span><span style="display:flex;"><span>        <span style="color:#a6e3a1">&#34;date_range&#34;</span>: {<span style="color:#a6e3a1">&#34;start&#34;</span>: since_date, <span style="color:#a6e3a1">&#34;end&#34;</span>: until_date},
</span></span><span style="display:flex;"><span>        <span style="color:#a6e3a1">&#34;total_spent&#34;</span>: total_spent,
</span></span><span style="display:flex;"><span>        <span style="color:#a6e3a1">&#34;transaction_count&#34;</span>: transaction_count,
</span></span><span style="display:flex;"><span>        <span style="color:#a6e3a1">&#34;average_per_month&#34;</span>: average_per_month,
</span></span><span style="display:flex;"><span>        <span style="color:#a6e3a1">&#34;monthly_breakdown&#34;</span>: [
</span></span><span style="display:flex;"><span>            {<span style="color:#a6e3a1">&#34;month&#34;</span>: month, <span style="color:#a6e3a1">&#34;spent&#34;</span>: amount}
</span></span><span style="display:flex;"><span>            <span style="color:#cba6f7">for</span> month, amount <span style="color:#89dceb;font-weight:bold">in</span> <span style="color:#89dceb">sorted</span>(monthly_totals<span style="color:#89dceb;font-weight:bold">.</span>items())
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>    }
</span></span></code></pre></div><p>The impact was dramatic. Let me show you a real example from my implementation:</p>
<p><strong>Scenario:</strong> Analyze 6 months of spending for a single category (22 transactions)</p>
<ul>
<li><strong>Before (returning raw transactions)</strong>: 4,890 tokens</li>
<li><strong>After (pre-aggregated summary)</strong>: 262 tokens</li>
<li><strong>Reduction</strong>: 94.6%</li>
</ul>
<p>The aggregated response includes:</p>
<ul>
<li>Total spent</li>
<li>Average per month</li>
<li>Transaction count</li>
<li>Monthly breakdown (array of {month, amount} objects)</li>
</ul>
<p>That&rsquo;s everything Claude needs to answer questions like &ldquo;Am I spending more on groceries this year than last year?&rdquo; without having to receive, parse, and aggregate dozens of individual transaction records.</p>
<h3 id="design-principle-2-pre-aggregate-data">Design Principle #2: Pre-Aggregate Data</h3>
<p>Pre-calculate aggregations, summaries, and statistics in your tool code. Return insights, not raw data. This keeps your context window lean while still giving the model everything it needs to help users.</p>
<p>While filtering and aggregation solved the token efficiency problem, I ran into a different challenge: the API itself had limitations.</p>
<h2 id="building-tools-for-unsupported-actions">Building Tools for Unsupported Actions</h2>
<p>While working on the workflow to have Claude help me categorize transactions, I realized I needed a way to split a transaction across multiple categories. For example, a Costco purchase might include $150 of groceries, $50 of household items, and $30 of gas.</p>
<p>Unfortunately, the YNAB API does not provide a way to convert an existing transaction into a split transaction. The API only allows creating NEW transactions with splits. This was a real limitation - but it presented an opportunity to think creatively about tool design.</p>
<p>Instead of telling users &ldquo;sorry, the API doesn&rsquo;t support this,&rdquo; I created a tool that works within the API&rsquo;s constraints:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#89b4fa;font-weight:bold">@mcp.tool</span>()
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">async</span> <span style="color:#cba6f7">def</span> <span style="color:#89b4fa">prepare_split_for_matching</span>(
</span></span><span style="display:flex;"><span>    budget_id: <span style="color:#89dceb">str</span>,
</span></span><span style="display:flex;"><span>    transaction_id: <span style="color:#89dceb">str</span>,
</span></span><span style="display:flex;"><span>    subtransactions: <span style="color:#89dceb">str</span>,
</span></span><span style="display:flex;"><span>) <span style="color:#89dceb;font-weight:bold">-&gt;</span> <span style="color:#89dceb">str</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;&#34;&#34;Prepare a split transaction to match with an existing imported transaction.
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    This tool fetches an existing transaction&#39;s details and creates a new UNAPPROVED split
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    transaction with the same date, amount, account, and payee. You can then manually match
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    them together in the YNAB web or mobile UI.
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    Workflow:
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">        1. This tool fetches the existing transaction details
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">        2. Creates a new unapproved split transaction with those details
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">        3. You manually match them in the YNAB UI
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">        4. YNAB merges them into one split transaction
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    Note:
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">        - The new split is created as UNAPPROVED for manual matching
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">        - The sum of subtransaction amounts should equal the original transaction amount
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    &#34;&#34;&#34;</span>
</span></span></code></pre></div><p>The implementation:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#cba6f7">async</span> <span style="color:#cba6f7">def</span> <span style="color:#89b4fa">prepare_split_for_matching</span>(
</span></span><span style="display:flex;"><span>    <span style="color:#89dceb">self</span>,
</span></span><span style="display:flex;"><span>    budget_id: <span style="color:#89dceb">str</span>,
</span></span><span style="display:flex;"><span>    transaction_id: <span style="color:#89dceb">str</span>,
</span></span><span style="display:flex;"><span>    subtransactions: <span style="color:#89dceb">list</span>[<span style="color:#89dceb">dict</span>[<span style="color:#89dceb">str</span>, Any]],
</span></span><span style="display:flex;"><span>) <span style="color:#89dceb;font-weight:bold">-&gt;</span> <span style="color:#89dceb">dict</span>[<span style="color:#89dceb">str</span>, Any]:
</span></span><span style="display:flex;"><span>    <span style="color:#6c7086;font-style:italic"># Fetch the original transaction details</span>
</span></span><span style="display:flex;"><span>    original <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#cba6f7">await</span> <span style="color:#89dceb">self</span><span style="color:#89dceb;font-weight:bold">.</span>get_transaction(budget_id, transaction_id)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6c7086;font-style:italic"># Create a new split transaction with the same details but unapproved</span>
</span></span><span style="display:flex;"><span>    new_split <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#cba6f7">await</span> <span style="color:#89dceb">self</span><span style="color:#89dceb;font-weight:bold">.</span>create_split_transaction(
</span></span><span style="display:flex;"><span>        budget_id<span style="color:#89dceb;font-weight:bold">=</span>budget_id,
</span></span><span style="display:flex;"><span>        account_id<span style="color:#89dceb;font-weight:bold">=</span>original[<span style="color:#a6e3a1">&#34;account_id&#34;</span>],
</span></span><span style="display:flex;"><span>        date<span style="color:#89dceb;font-weight:bold">=</span>original[<span style="color:#a6e3a1">&#34;date&#34;</span>],
</span></span><span style="display:flex;"><span>        amount<span style="color:#89dceb;font-weight:bold">=</span>original[<span style="color:#a6e3a1">&#34;amount&#34;</span>],
</span></span><span style="display:flex;"><span>        subtransactions<span style="color:#89dceb;font-weight:bold">=</span>subtransactions,
</span></span><span style="display:flex;"><span>        payee_name<span style="color:#89dceb;font-weight:bold">=</span>original<span style="color:#89dceb;font-weight:bold">.</span>get(<span style="color:#a6e3a1">&#34;payee_name&#34;</span>),
</span></span><span style="display:flex;"><span>        memo<span style="color:#89dceb;font-weight:bold">=</span>original<span style="color:#89dceb;font-weight:bold">.</span>get(<span style="color:#a6e3a1">&#34;memo&#34;</span>),
</span></span><span style="display:flex;"><span>        cleared<span style="color:#89dceb;font-weight:bold">=</span>original<span style="color:#89dceb;font-weight:bold">.</span>get(<span style="color:#a6e3a1">&#34;cleared&#34;</span>, <span style="color:#a6e3a1">&#34;uncleared&#34;</span>),
</span></span><span style="display:flex;"><span>        approved<span style="color:#89dceb;font-weight:bold">=</span><span style="color:#fab387">False</span>,  <span style="color:#6c7086;font-style:italic"># Key: create as unapproved for matching</span>
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">return</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#a6e3a1">&#34;original_transaction&#34;</span>: original,
</span></span><span style="display:flex;"><span>        <span style="color:#a6e3a1">&#34;new_split_transaction&#34;</span>: new_split,
</span></span><span style="display:flex;"><span>        <span style="color:#a6e3a1">&#34;instructions&#34;</span>: (
</span></span><span style="display:flex;"><span>            <span style="color:#a6e3a1">&#34;A new unapproved split transaction has been created. &#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#a6e3a1">&#34;Go to YNAB and manually match these two transactions together. &#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#a6e3a1">&#34;Look for the match indicator in the YNAB UI.&#34;</span>
</span></span><span style="display:flex;"><span>        ),
</span></span><span style="display:flex;"><span>    }
</span></span></code></pre></div><p>This solution works because YNAB has a built-in &ldquo;matching&rdquo; feature where it can merge a manually-entered transaction with an imported one. By creating the split as unapproved, YNAB&rsquo;s UI will detect the duplicate and offer to match them. When you accept the match, the imported transaction becomes a proper split transaction.</p>
<p>Is this ideal? No - I&rsquo;d prefer a direct API endpoint. But it&rsquo;s a pragmatic solution that works within the constraints of the underlying platform while still providing value to the user.</p>
<h3 id="design-principle-3-work-within-api-constraints">Design Principle #3: Work Within API Constraints</h3>
<p>When an API doesn&rsquo;t support something directly, look for workflows that combine available operations to achieve the desired outcome.</p>
<h2 id="real-world-token-reduction-results">Real-World Token Reduction Results</h2>
<p>Here&rsquo;s a summary of the optimizations applied across the YNAB MCP, with real measured token counts:</p>
<table>
  <thead>
      <tr>
          <th>Tool/Workflow</th>
          <th>Naive Approach</th>
          <th>Optimized</th>
          <th>Reduction</th>
          <th>Technique Applied</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Accounts</strong></td>
          <td>9,960 tokens (18 fields)</td>
          <td>3,451 tokens (6 fields)</td>
          <td>65.4%</td>
          <td>Field filtering</td>
      </tr>
      <tr>
          <td><strong>Categories</strong></td>
          <td>12,445 tokens (all)</td>
          <td>8,620 tokens (visible)</td>
          <td>30.7%</td>
          <td>Default filtering + opt-in</td>
      </tr>
      <tr>
          <td><strong>Budget Overview</strong></td>
          <td>~30,405 tokens</td>
          <td>~18,879 tokens</td>
          <td>38%</td>
          <td>Combined filtering</td>
      </tr>
      <tr>
          <td><strong>Category Spending (6mo)</strong></td>
          <td>4,890 tokens (raw txns)</td>
          <td>262 tokens (summary)</td>
          <td>94.6%</td>
          <td>Pre-aggregation</td>
      </tr>
      <tr>
          <td><strong>Year of Transactions</strong></td>
          <td>746,800 tokens</td>
          <td>262 tokens</td>
          <td>99.96%</td>
          <td>Pre-aggregation</td>
      </tr>
  </tbody>
</table>
<h3 id="when-to-apply-each-technique">When to Apply Each Technique</h3>
<p><strong>Field Filtering</strong> (accounts, categories)</p>
<ul>
<li>Use when: API returns many fields, but only subset is needed for common queries</li>
<li>Savings: Moderate (30-65%)</li>
<li>Complexity: Low - simple field selection</li>
<li>Example: Remove debt details from non-debt accounts, skip internal IDs like <code>transfer_payee_id</code>, drop reconciliation timestamps
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Instead of returning all 18 fields, return only what matters</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">return</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;id&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>id,
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;name&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>name,
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;balance&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>balance <span style="color:#89dceb;font-weight:bold">/</span> <span style="color:#fab387">1000</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;on_budget&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>on_budget
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div></li>
</ul>
<p><strong>Default Filtering with Parameters</strong> (hidden categories)</p>
<ul>
<li>Use when: Some data is rarely needed but occasionally useful</li>
<li>Savings: Moderate (30-40%)</li>
<li>Complexity: Low - add optional boolean parameter</li>
<li>Example: Hide deleted/archived items by default, expose via <code>include_deleted</code> flag</li>
</ul>
<p><strong>Pre-aggregation</strong> (spending analysis)</p>
<ul>
<li>Use when: Model would need to compute summaries from raw data</li>
<li>Savings: High (90-99%)</li>
<li>Complexity: Medium - requires aggregation logic</li>
<li>Example: Return monthly totals instead of individual transactions</li>
</ul>
<p><strong>Creative Workarounds</strong> (split transactions)</p>
<ul>
<li>Use when: API doesn&rsquo;t support desired operation directly</li>
<li>Savings: Enables new functionality (not about tokens)</li>
<li>Complexity: High - requires understanding API constraints</li>
<li>Example: Multi-step workflows that achieve goals indirectly</li>
</ul>
<h3 id="how-to-measure-your-own-mcp">How to Measure Your Own MCP</h3>
<p>You might be wondering how I got these specific numbers. Here&rsquo;s my methodology—and how you can apply it to your own MCPs.</p>
<p>All the numbers in this post came from real measurements. Here&rsquo;s how I validated the optimizations, and how you can do the same for your MCP:</p>
<h4 id="1-set-up-token-counting">1. Set Up Token Counting</h4>
<p>Install <a href="https://github.com/openai/tiktoken">tiktoken</a>, OpenAI&rsquo;s tokenizer library (Claude uses a similar tokenization scheme):</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>pip install tiktoken
</span></span></code></pre></div><p>Create a helper function to count tokens:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#94e2d5">import</span> <span style="color:#fab387">tiktoken</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">def</span> <span style="color:#89b4fa">count_tokens</span>(text: <span style="color:#89dceb">str</span>) <span style="color:#89dceb;font-weight:bold">-&gt;</span> <span style="color:#89dceb">int</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;&#34;&#34;Count tokens using tiktoken&#39;s cl100k_base encoding.&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    encoding <span style="color:#89dceb;font-weight:bold">=</span> tiktoken<span style="color:#89dceb;font-weight:bold">.</span>get_encoding(<span style="color:#a6e3a1">&#34;cl100k_base&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">return</span> <span style="color:#89dceb">len</span>(encoding<span style="color:#89dceb;font-weight:bold">.</span>encode(text))
</span></span></code></pre></div><h4 id="2-measure-api-responses">2. Measure API Responses</h4>
<p>Create a script that fetches data both ways and compares:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#94e2d5">import</span> <span style="color:#fab387">json</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Get raw API response</span>
</span></span><span style="display:flex;"><span>raw_response <span style="color:#89dceb;font-weight:bold">=</span> api<span style="color:#89dceb;font-weight:bold">.</span>get_accounts(budget_id)
</span></span><span style="display:flex;"><span>raw_json <span style="color:#89dceb;font-weight:bold">=</span> json<span style="color:#89dceb;font-weight:bold">.</span>dumps(raw_response, indent<span style="color:#89dceb;font-weight:bold">=</span><span style="color:#fab387">2</span>)
</span></span><span style="display:flex;"><span>raw_tokens <span style="color:#89dceb;font-weight:bold">=</span> count_tokens(raw_json)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Get your filtered response</span>
</span></span><span style="display:flex;"><span>filtered_response <span style="color:#89dceb;font-weight:bold">=</span> your_mcp_tool<span style="color:#89dceb;font-weight:bold">.</span>get_accounts(budget_id)
</span></span><span style="display:flex;"><span>filtered_json <span style="color:#89dceb;font-weight:bold">=</span> json<span style="color:#89dceb;font-weight:bold">.</span>dumps(filtered_response, indent<span style="color:#89dceb;font-weight:bold">=</span><span style="color:#fab387">2</span>)
</span></span><span style="display:flex;"><span>filtered_tokens <span style="color:#89dceb;font-weight:bold">=</span> count_tokens(filtered_json)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Compare</span>
</span></span><span style="display:flex;"><span>reduction <span style="color:#89dceb;font-weight:bold">=</span> ((raw_tokens <span style="color:#89dceb;font-weight:bold">-</span> filtered_tokens) <span style="color:#89dceb;font-weight:bold">/</span> raw_tokens) <span style="color:#89dceb;font-weight:bold">*</span> <span style="color:#fab387">100</span>
</span></span><span style="display:flex;"><span><span style="color:#89dceb">print</span>(<span style="color:#f38ba8">f</span><span style="color:#a6e3a1">&#34;Raw: </span><span style="color:#a6e3a1">{</span>raw_tokens<span style="color:#a6e3a1">:</span><span style="color:#a6e3a1">,</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1"> tokens&#34;</span>)
</span></span><span style="display:flex;"><span><span style="color:#89dceb">print</span>(<span style="color:#f38ba8">f</span><span style="color:#a6e3a1">&#34;Filtered: </span><span style="color:#a6e3a1">{</span>filtered_tokens<span style="color:#a6e3a1">:</span><span style="color:#a6e3a1">,</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1"> tokens&#34;</span>)
</span></span><span style="display:flex;"><span><span style="color:#89dceb">print</span>(<span style="color:#f38ba8">f</span><span style="color:#a6e3a1">&#34;Reduction: </span><span style="color:#a6e3a1">{</span>reduction<span style="color:#a6e3a1">:</span><span style="color:#a6e3a1">.1f</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1">%&#34;</span>)
</span></span></code></pre></div><h4 id="3-test-real-workflows">3. Test Real Workflows</h4>
<p>Don&rsquo;t just measure individual tools - measure complete workflows users will perform:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Simulate a budget overview workflow</span>
</span></span><span style="display:flex;"><span>accounts <span style="color:#89dceb;font-weight:bold">=</span> your_mcp<span style="color:#89dceb;font-weight:bold">.</span>get_accounts(budget_id)
</span></span><span style="display:flex;"><span>categories <span style="color:#89dceb;font-weight:bold">=</span> your_mcp<span style="color:#89dceb;font-weight:bold">.</span>get_categories(budget_id, include_hidden<span style="color:#89dceb;font-weight:bold">=</span><span style="color:#fab387">False</span>)
</span></span><span style="display:flex;"><span>summary <span style="color:#89dceb;font-weight:bold">=</span> your_mcp<span style="color:#89dceb;font-weight:bold">.</span>get_budget_summary(budget_id, current_month)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>total_tokens <span style="color:#89dceb;font-weight:bold">=</span> (
</span></span><span style="display:flex;"><span>    count_tokens(json<span style="color:#89dceb;font-weight:bold">.</span>dumps(accounts)) <span style="color:#89dceb;font-weight:bold">+</span>
</span></span><span style="display:flex;"><span>    count_tokens(json<span style="color:#89dceb;font-weight:bold">.</span>dumps(categories)) <span style="color:#89dceb;font-weight:bold">+</span>
</span></span><span style="display:flex;"><span>    count_tokens(json<span style="color:#89dceb;font-weight:bold">.</span>dumps(summary))
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#89dceb">print</span>(<span style="color:#f38ba8">f</span><span style="color:#a6e3a1">&#34;Budget overview workflow: </span><span style="color:#a6e3a1">{</span>total_tokens<span style="color:#a6e3a1">:</span><span style="color:#a6e3a1">,</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1"> tokens&#34;</span>)
</span></span></code></pre></div><p>This revealed that my budget overview workflow uses ~19k tokens - well within Claude&rsquo;s <a href="https://docs.anthropic.com/en/docs/about-claude/models">200k context window</a> with room to spare.</p>
<h4 id="4-watch-for-runtime-warnings">4. Watch for Runtime Warnings</h4>
<p>Claude Code will warn you when tool responses exceed ~10k tokens. If you see these warnings frequently, it&rsquo;s a signal to investigate:</p>
<blockquote>
<p>⚠️ Large MCP response (~12.5k tokens), this can fill up context quickly.</p></blockquote>
<p>These warnings helped me identify which tools needed optimization.</p>
<h4 id="5-validate-correctness">5. Validate Correctness</h4>
<p>Token reduction means nothing if your tools return incorrect data. Always verify:</p>
<ul>
<li>Does the filtered data answer the user&rsquo;s questions?</li>
<li>Are calculations accurate? (spot-check aggregations against raw data)</li>
<li>Does pagination work correctly? (ensure you&rsquo;re not computing on partial datasets)</li>
</ul>
<p>The goal isn&rsquo;t to minimize tokens at all costs - it&rsquo;s to return exactly what the model needs, nothing more, nothing less.</p>
<h2 id="limitations-and-trade-offs">Limitations and Trade-offs</h2>
<p>This context-efficient approach works well for YNAB, but it&rsquo;s not without limitations. Before applying these patterns to your own MCP, consider these trade-offs:</p>
<p><strong>Pre-aggregation assumes query patterns.</strong> If users ask questions that need raw transaction details (like &ldquo;show me the memo for my largest grocery purchase&rdquo;), the aggregated data won&rsquo;t help. You&rsquo;ll need additional tools that return raw data for those cases.</p>
<p><strong>Filtering loses flexibility.</strong> By removing fields, you can&rsquo;t answer questions that need those fields without making additional API calls. The key is knowing your use cases. For budget analysis and categorization, these trade-offs are worth it. For transaction-level forensics, you might need different tools.</p>
<p><strong>Caching complexity.</strong> Pre-computed aggregations need invalidation strategies when data changes. If your underlying data updates frequently, you&rsquo;ll need to think carefully about cache freshness and when to recompute.</p>
<p><strong>Development overhead.</strong> Writing aggregation logic and filtering code is more work than simple pass-through wrappers. You&rsquo;re trading implementation time for runtime efficiency. For frequently-used tools, this is usually worth it.</p>
<p>The goal isn&rsquo;t to optimize every tool to the extreme. It&rsquo;s to identify the high-impact workflows—the ones users will perform repeatedly—and optimize those intelligently.</p>
<h2 id="key-takeaways">Key Takeaways</h2>
<ul>
<li><strong>Context is expensive</strong>: In real Claude Code sessions, MCP tool definitions can consume 24% of the 200k context window before any tools are called</li>
<li><strong>Measure everything</strong>: Use tiktoken to count tokens on API responses before and after optimization</li>
<li><strong>Filter proactively</strong>: Removing 12 unnecessary fields from 47 accounts reduced tokens by 65.4%</li>
<li><strong>Aggregate strategically</strong>: Pre-computing spending summaries reduced a 6-month query by 94.6%</li>
<li><strong>Design for the model</strong>: Ask &ldquo;What does the model need?&rdquo; not &ldquo;What does the API provide?&rdquo;</li>
<li><strong>Default to minimal data</strong>: Return only what&rsquo;s necessary by default, with optional parameters for edge cases</li>
<li><strong>Validate with real workflows</strong>: Test complete user flows, not just individual tools, to understand cumulative token impact</li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>Building a context-efficient MCP requires a mindset shift: design for what the model needs, not just what the API provides. The three principles I learned—filter at the source, compute in tools, and work within constraints creatively—apply to any MCP wrapping an external API.</p>
<p>Through careful design, the YNAB MCP achieves dramatic efficiency:</p>
<ul>
<li>Budget overview: 38% reduction (30k → 19k tokens)</li>
<li>Spending analysis: 94.6% reduction (4.9k → 262 tokens)</li>
<li>Category filtering: 30.7% reduction (12.4k → 8.6k tokens)</li>
</ul>
<p>These aren&rsquo;t theoretical—they&rsquo;re real measurements from actual usage. Context is expensive, and every token matters when you&rsquo;re building tools for multi-turn conversations.</p>
<p>If you&rsquo;re building an MCP, start by asking: &ldquo;What does the model actually need to answer the user&rsquo;s question?&rdquo; Not &ldquo;What does the API return?&rdquo; That mindset shift makes all the difference.</p>
<h2 id="further-reading">Further Reading</h2>
<p>If you want to dive deeper into MCP development and context optimization:</p>
<ul>
<li><a href="https://spec.modelcontextprotocol.io/">Model Context Protocol Specification</a> - Official MCP spec and documentation</li>
<li><a href="https://www.anthropic.com/engineering/code-execution-with-mcp">Code Execution with MCP</a> - Anthropic&rsquo;s engineering blog on building with MCP</li>
<li><a href="https://youtu.be/-uW5-TaVXu4?si=GBCXG3Q5QcfEdvnJ">Most devs don&rsquo;t understand how context windows work</a> - Deep dive into context window fundamentals and practical management strategies</li>
<li><a href="https://api.ynab.com/">YNAB API Documentation</a> - The API this MCP wraps</li>
<li><a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview">Anthropic&rsquo;s Prompt Engineering Guide</a> - Understanding context windows and token efficiency</li>
</ul>
<p>The full YNAB MCP implementation is available at <a href="https://github.com/dgalarza/ynab-mcp-dgalarza">github.com/dgalarza/ynab-mcp-dgalarza</a> if you want to dive deeper into the code. I&rsquo;d love to hear about the context optimization techniques you&rsquo;ve discovered in your own MCP projects.</p>
<hr>
<p>Building MCP servers or designing Claude Code workflows for your team? I help engineers get this right from the start. <a href="/claude-code/">Learn more</a>.</p>
]]></content:encoded></item></channel></rss>