<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Agentic-Development on Damian Galarza | Software Engineering &amp; AI Consulting</title><link>https://www.damiangalarza.com/tags/agentic-development/</link><description>Recent posts from Damian Galarza | Software Engineering &amp; AI Consulting</description><generator>Hugo</generator><language>en-us</language><managingEditor>Damian Galarza</managingEditor><atom:link href="https://www.damiangalarza.com/tags/agentic-development/feed.xml" rel="self" type="application/rss+xml"/><item><title>Four Dimensions of Agent-Ready Codebase Design</title><link>https://www.damiangalarza.com/posts/2026-03-25-four-patterns-that-separate-agent-ready-codebases/</link><pubDate>Wed, 25 Mar 2026 00:00:00 -0400</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/posts/2026-03-25-four-patterns-that-separate-agent-ready-codebases/</guid><description>AI agents produce better output when the codebase is ready for them. Here are the four dimensions of codebase readiness that account for most of the gap.</description><content:encoded><![CDATA[<p>When an AI agent rewrites a file and the result doesn&rsquo;t match your conventions, the first move is usually to adjust the prompt. Try different instructions. Add more context to the message. Maybe switch models.</p>
<p>The model is rarely the bottleneck. The codebase is.</p>
<p>The same model, pointed at a codebase with strong tests, clear architecture, and good documentation, produces remarkably consistent output. Point it at a codebase with weak coverage, no architecture docs, and no linting, and you get drift. Not because the model is less capable, but because it has less to work with.</p>
<p>I built the <a href="/codebase-readiness/">Codebase Readiness Assessment</a> to make this measurable. It scores your repo across eight dimensions on a 0-100 scale. But you don&rsquo;t need to run the assessment to understand what separates high-scoring codebases from low-scoring ones. Four dimensions account for most of the gap.</p>
<h2 id="test-foundation">Test Foundation</h2>
<p>Test foundation carries the most weight in the assessment (25%) because it&rsquo;s the single biggest lever for agent output quality.</p>
<h3 id="what-a-low-score-looks-like">What a low score looks like</h3>
<p>An agent makes a change. There are no tests covering that area, so it moves on. The change compiles, maybe even runs, but it broke an assumption three modules away. Nobody finds out until a human reviews the PR, or worse, until production.</p>
<p>I&rsquo;ve seen this repeatedly: teams with 30-40% test coverage ask an agent to refactor a service object. The agent produces clean code that looks right. But there&rsquo;s no spec for the edge case where a nil association triggers a downstream error. The agent had no way to catch it because there&rsquo;s no test to fail.</p>
<p>The other failure mode is slow tests. If your suite takes 20 minutes, the agent can&rsquo;t iterate. It makes a change, waits, discovers the failure, tries again, waits again. In a fast suite, that feedback cycle takes seconds. In a slow one, the agent burns time and money waiting for results.</p>
<h3 id="what-a-high-score-looks-like">What a high score looks like</h3>
<p>Codebases that score well here share a few characteristics:</p>
<ul>
<li><strong>Coverage above 70% on critical paths.</strong> Not 100% everywhere, but thorough coverage on the code that matters: domain logic, service objects, API endpoints. The agent can make changes and get immediate confirmation that nothing broke.</li>
<li><strong>Suite runs in under 5 minutes.</strong> Fast enough that the agent can run tests after every meaningful change, not just at the end.</li>
<li><strong>Deterministic results.</strong> No flaky tests. When the suite says green, it means green. Agents can&rsquo;t distinguish between a flaky failure and a real one, so flaky tests teach agents to ignore failures.</li>
</ul>
<h3 id="dont-stop-at-unit-tests">Don&rsquo;t stop at unit tests</h3>
<p>Unit tests on service objects and models are the foundation, but they only verify isolated behavior. An agent that passes all unit tests can still break a user-facing workflow that spans multiple components.</p>
<p>End-to-end tests give agents confidence across entire flows. A system spec that signs a user in, submits a form, and checks the result tells the agent whether the <em>feature</em> works, not just whether a method returns the right value. This is especially valuable when agents make changes that touch controllers, views, and services in the same PR.</p>
<p>Here&rsquo;s a simplified system spec from one of my Rails projects. It covers the core user journey: signing in and submitting a video idea for validation.</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-ruby" data-lang="ruby"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># spec/system/idea_submission_spec.rb</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f9e2af">RSpec</span><span style="color:#89dceb;font-weight:bold">.</span>describe <span style="color:#a6e3a1">&#34;Idea submission&#34;</span> <span style="color:#cba6f7">do</span>
</span></span><span style="display:flex;"><span>  it <span style="color:#a6e3a1">&#34;allows a signed-in user to submit a video idea&#34;</span> <span style="color:#cba6f7">do</span>
</span></span><span style="display:flex;"><span>    user <span style="color:#89dceb;font-weight:bold">=</span> create(<span style="color:#a6e3a1">:user</span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    sign_in_as(user, <span style="color:#a6e3a1">path</span>: new_idea_path)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#89dceb">select</span> user<span style="color:#89dceb;font-weight:bold">.</span>channels<span style="color:#89dceb;font-weight:bold">.</span>first<span style="color:#89dceb;font-weight:bold">.</span>name, <span style="color:#a6e3a1">from</span>: <span style="color:#a6e3a1">&#34;Channel&#34;</span>
</span></span><span style="display:flex;"><span>    fill_in <span style="color:#a6e3a1">&#34;Title&#34;</span>, <span style="color:#a6e3a1">with</span>: <span style="color:#a6e3a1">&#34;Building a Rails AI Agent from Scratch&#34;</span>
</span></span><span style="display:flex;"><span>    fill_in <span style="color:#a6e3a1">&#34;Description&#34;</span>, <span style="color:#a6e3a1">with</span>: <span style="color:#a6e3a1">&#34;Step-by-step tutorial on building an AI agent&#34;</span>
</span></span><span style="display:flex;"><span>    fill_in <span style="color:#a6e3a1">&#34;Category&#34;</span>, <span style="color:#a6e3a1">with</span>: <span style="color:#a6e3a1">&#34;AI Coding&#34;</span>
</span></span><span style="display:flex;"><span>    click_button <span style="color:#a6e3a1">&#34;Validate Idea&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    expect(page)<span style="color:#89dceb;font-weight:bold">.</span>to have_content(<span style="color:#a6e3a1">&#34;Building a Rails AI Agent from Scratch&#34;</span>)
</span></span><span style="display:flex;"><span>  <span style="color:#cba6f7">end</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">end</span>
</span></span></code></pre></div><p>This test touches authentication, the form UI, the controller, the background job, and the results page. If an agent breaks any part of that chain, this spec catches it.</p>
<p>The tradeoff is speed. End-to-end tests are slower and more brittle than unit tests. You don&rsquo;t need full E2E coverage, but having system specs on your critical user journeys (signup, checkout, the core action your product is built around) gives agents a safety net that unit tests alone can&rsquo;t provide.</p>
<h3 id="the-smallest-change-that-moves-the-needle">The smallest change that moves the needle</h3>
<p>Add coverage to your critical paths first. Don&rsquo;t chase a coverage number. Instead, identify the three or four service objects or domain models where bugs would hurt the most, and write specs for those. Then add one or two system specs covering your most important user journeys end-to-end. If your suite is slow, add parallel test execution. In a Rails app, that might be as simple as adding the <code>parallel_tests</code> gem. A suite that goes from 15 minutes to 4 minutes fundamentally changes how an agent can work with your code. If you&rsquo;re running multiple agents in parallel, you&rsquo;ll also need <a href="/posts/2026-03-10-extending-claude-code-worktrees-for-true-database-isolation/">database isolation per worktree</a> to prevent test data collisions.</p>
<p>If you want to accelerate the process, tools like <a href="https://github.com/uditgoenka/autoresearch">autoresearch</a> apply this pattern as an autonomous loop: give the agent a measurable goal (like a coverage target), and it iterates, verifies, keeps what works, and discards what doesn&rsquo;t.</p>
<h2 id="documentation-as-code">Documentation as Code</h2>
<p>Documentation carries 15% of the assessment weight, but in practice it&rsquo;s the dimension where I see the biggest gap between teams that get good agent output and teams that don&rsquo;t.</p>
<h3 id="what-a-low-score-looks-like-1">What a low score looks like</h3>
<p>Without an agent-facing entry point (a <code>CLAUDE.md</code>, <code>AGENTS.md</code>, or equivalent), an agent has to reverse-engineer your conventions from the code itself. It reads your files, infers patterns, and guesses at intent. Sometimes it guesses right. Often it doesn&rsquo;t.</p>
<p>Here&rsquo;s a concrete example. A Rails app uses service objects for all business logic. Controllers call a service, the service does the work, and the result gets rendered. There&rsquo;s nothing enforcing this in the framework. It&rsquo;s a team convention. An agent that doesn&rsquo;t know about this convention puts the logic directly in the controller action. The code works. The tests pass. But it breaks the team&rsquo;s pattern, and now there&rsquo;s a 50-line controller action that should have been a service object.</p>
<p>The agent wasn&rsquo;t wrong. It had no way to know.</p>
<h3 id="what-a-high-score-looks-like-1">What a high score looks like</h3>
<p>The key insight is that this entry point file should be a map, not a manual. OpenAI&rsquo;s Harness Engineering team <a href="https://openai.com/index/harness-engineering/">learned this the hard way</a>: they tried a single large instruction file and it failed because &ldquo;context is a scarce resource&rdquo; and &ldquo;too much guidance becomes non-guidance.&rdquo; When everything is marked important, agents pattern-match locally instead of navigating intentionally.</p>
<p>Their solution: keep the entry file short (roughly 100 lines) and treat it as a table of contents that points to deeper sources of truth in a structured <code>docs/</code> directory. The entry file gives agents quick commands and a documentation map. The detail lives in dedicated files the agent reads when it needs them. Whether you call it <code>CLAUDE.md</code>, <code>AGENTS.md</code>, or <code>CURSOR.md</code>, the pattern is the same.</p>
<p>Here&rsquo;s what this looks like in practice from one of my Rails projects:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-markdown" data-lang="markdown"><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold">## Quick Commands
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span>
</span></span><span style="display:flex;"><span>bin/dev                                # Start dev server
</span></span><span style="display:flex;"><span>bin/rails spec                         # All tests
</span></span><span style="display:flex;"><span>bin/ci                                 # Full CI: lint + security + tests
</span></span><span style="display:flex;"><span>bin/rubocop                            # Lint
</span></span><span style="display:flex;"><span>bin/brakeman                           # Security scan
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold">## Documentation Map
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span>
</span></span><span style="display:flex;"><span>| Topic | Document |
</span></span><span style="display:flex;"><span>|-------|----------|
</span></span><span style="display:flex;"><span>| Stack, patterns, domain model | docs/ARCHITECTURE.md |
</span></span><span style="display:flex;"><span>| Testing patterns and stack | docs/TESTING.md |
</span></span><span style="display:flex;"><span>| Credentials, env vars, API keys | docs/CONFIGURATION.md |
</span></span><span style="display:flex;"><span>| Engineering principles | docs/design-docs/core-beliefs.md |
</span></span><span style="display:flex;"><span>| Architecture decision records | docs/design-docs/ |
</span></span></code></pre></div><p>The agent gets commands and a map up front. When it needs to understand the domain model or testing conventions, it follows the pointer. This is progressive disclosure: the agent starts with what it needs immediately and loads deeper context on demand.</p>
<p>Here&rsquo;s a trimmed excerpt from the <code>ARCHITECTURE.md</code> behind that pointer:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-markdown" data-lang="markdown"><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold">## Domain Model
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span>
</span></span><span style="display:flex;"><span>CreatorSignal validates YouTube video ideas. The core flow:
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">1.</span> User submits a video <span style="font-weight:bold">**Idea**</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">2.</span> A <span style="font-weight:bold">**Validation**</span> job is enqueued
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">3.</span> The <span style="font-weight:bold">**ResearchAgent**</span> runs tools against YouTube, Reddit, X, and HN
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">4.</span> Results are synthesized into a scored <span style="font-weight:bold">**Go / Refine / Kill**</span> verdict
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold">### Key Models
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span>
</span></span><span style="display:flex;"><span>| Model | Responsibility |
</span></span><span style="display:flex;"><span>|-------|---------------|
</span></span><span style="display:flex;"><span>| <span style="color:#a6e3a1">`User`</span> | Authentication, subscription plan |
</span></span><span style="display:flex;"><span>| <span style="color:#a6e3a1">`Idea`</span> | A video idea submitted for validation |
</span></span><span style="display:flex;"><span>| <span style="color:#a6e3a1">`Validation`</span> | One run of the research agent against an idea |
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold">### Project Structure
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span>
</span></span><span style="display:flex;"><span>app/
</span></span><span style="display:flex;"><span>├── components/       # ViewComponent components
</span></span><span style="display:flex;"><span>├── controllers/
</span></span><span style="display:flex;"><span>├── jobs/             # ActiveJob jobs (async validation)
</span></span><span style="display:flex;"><span>├── models/
</span></span><span style="display:flex;"><span>├── services/         # Research agent, tool orchestration
</span></span><span style="display:flex;"><span>└── views/            # Hotwire (Turbo frames/streams)
</span></span></code></pre></div><p>An agent reading this knows what an <code>Idea</code> is, that validation is async through a job, and that orchestration logic lives in <code>app/services/</code>. Those are the conventions that prevent drift.</p>
<p>ADRs (Architecture Decision Records) add a layer that documentation alone can&rsquo;t. An agent that understands <em>why</em> a particular pattern was chosen can make better decisions when extending it. If your ADR says &ldquo;we chose event sourcing for the billing domain because of auditability requirements,&rdquo; the agent won&rsquo;t try to refactor billing into simple CRUD.</p>
<h3 id="the-smallest-change-that-moves-the-needle-1">The smallest change that moves the needle</h3>
<p>Create an <code>AGENTS.md</code> in your project root with two things: commands (build, test, lint) and a documentation map pointing to deeper files. <a href="https://agents.md/"><code>AGENTS.md</code></a> is an emerging standard supported by Codex, Cursor, Gemini CLI, GitHub Copilot, Windsurf, Devin, and <a href="https://agents.md/">many others</a>. If you&rsquo;re using Claude Code, symlink <code>CLAUDE.md</code> to it so both resolve to the same file. Then create an <code>ARCHITECTURE.md</code> covering your stack, domain model, and key conventions. This can take an hour and the effect on agent output is immediate. If you want to automate the scaffolding, the <a href="https://github.com/dgalarza/claude-code-workflows">agent-ready plugin</a> generates a starting point based on your existing codebase.</p>
<h2 id="architecture-clarity">Architecture Clarity</h2>
<p>Architecture clarity carries 15% of the assessment weight. It measures whether an agent can understand where code belongs and how components relate to each other.</p>
<h3 id="what-a-low-score-looks-like-2">What a low score looks like</h3>
<p>Agents replicate patterns they find in the codebase. If your codebase has clear boundaries (controllers handle HTTP, services handle business logic, models handle persistence), the agent follows those boundaries. If your codebase mixes concerns, the agent mixes concerns.</p>
<p>The most common failure I see: a controller that does everything. It validates input, calls the database, sends emails, enqueues jobs. An agent asked to add a new feature looks at the existing controller, sees that&rsquo;s where logic goes, and adds more logic to the controller. The agent is doing exactly what the codebase taught it to do.</p>
<p>The subtler version is dependency direction. In a well-layered app, dependencies point inward: controllers depend on services, services depend on models. When that direction is inconsistent (models importing from controllers, services reaching into HTTP request objects), agents produce code with the same tangled dependencies.</p>
<h3 id="what-a-high-score-looks-like-2">What a high score looks like</h3>
<ul>
<li><strong>Clear layering.</strong> Each layer has a single responsibility, and the codebase is consistent about which layer owns what.</li>
<li><strong>Domain namespacing.</strong> Related functionality is grouped by business domain, not just by technical layer. Instead of a flat <code>app/services/</code> with 40 files, you have <code>app/services/billing/</code>, <code>app/services/onboarding/</code>, <code>app/services/research/</code>. When an agent needs to add billing logic, the namespace tells it exactly where to look and what patterns to follow.</li>
<li><strong>Predictable file organization.</strong> A new developer (or agent) can guess where a piece of code lives based on what it does.</li>
<li><strong>Dependency direction is consistent.</strong> Inner layers don&rsquo;t reach outward. You don&rsquo;t see models importing controller concerns.</li>
</ul>
<p>Domain namespacing is especially powerful for agents because it constrains the search space. An agent working on a billing feature only needs to understand the billing namespace, not the entire codebase. It finds the existing patterns in that namespace and replicates them. Without namespacing, the agent has to scan the whole codebase to figure out where billing logic lives, and it might find three different patterns in three different places.</p>
<h3 id="the-smallest-change-that-moves-the-needle-2">The smallest change that moves the needle</h3>
<p>If you have fat controllers, extract one. Pick your most complex controller action, pull the business logic into a service object, and write a spec for it. The agent will start using that service object pattern for new features. One well-structured example teaches the agent more than any documentation, because it&rsquo;s a pattern it can directly replicate.</p>
<p>If your codebase has grown past a handful of services, start namespacing by domain. Group related services, jobs, and models under a shared namespace. This compounds quickly: once you have three or four service objects under <code>Billing::</code>, agents start producing new billing code in the same namespace by default. The codebase becomes self-reinforcing.</p>
<h2 id="feedback-loops">Feedback Loops</h2>
<p>Feedback loops carry 10% of the assessment weight, but their impact is multiplicative. Good feedback loops make everything else work better. Poor ones make everything else work worse.</p>
<h3 id="what-a-low-score-looks-like-3">What a low score looks like</h3>
<p>Agents learn from the signals they get back. When the only signal is &ldquo;tests passed,&rdquo; the agent has no way to know it introduced a style violation, broke a naming convention, or used a deprecated API. It moves on, confident the change is correct.</p>
<p>Two things make feedback loops weak: <strong>narrow signals</strong> and <strong>slow signals</strong>.</p>
<p>Narrow signals mean the agent only hears from one source. Tests tell the agent whether the code works. They don&rsquo;t tell it whether the code follows your conventions, whether it introduced a security vulnerability, or whether the UI actually renders correctly. Each missing signal is a category of problems the agent can&rsquo;t self-correct.</p>
<p>Slow signals are just as damaging. If the agent has to wait 20 minutes for a CI run to discover a linting error, it&rsquo;s already moved on. It&rsquo;s built three more features on top of code that doesn&rsquo;t pass lint. Now you&rsquo;re unwinding multiple changes instead of catching the first one. The closer the feedback is to the moment of the change, the cheaper it is to fix.</p>
<p>There&rsquo;s also a hierarchy to how you enforce conventions. Anything that can be checked deterministically by a linter should be a lint rule, not a line in your <code>CLAUDE.md</code>. A lint rule catches every violation, every time. A documentation rule depends on the agent reading it and choosing to follow it. If your convention is &ldquo;methods must be under 20 lines&rdquo; or &ldquo;always use <code>frozen_string_literal</code>,&rdquo; encode it in RuboCop, ESLint, or whatever linter your stack uses. Save documentation for the things that can&rsquo;t be mechanically enforced: architectural decisions, domain context, workflow conventions.</p>
<h3 id="what-a-high-score-looks-like-3">What a high score looks like</h3>
<ul>
<li><strong>Pre-commit hooks for immediate feedback.</strong> The agent discovers formatting issues, type errors, or lint violations before it even commits.</li>
<li><strong>CI that runs in under 10 minutes.</strong> Fast enough that the agent can push, get feedback, and iterate without burning excessive context.</li>
<li><strong>Rich error messages.</strong> Linting output that says &ldquo;method too long (25 lines, max 20)&rdquo; is actionable. A generic &ldquo;style violation&rdquo; is not.</li>
</ul>
<p>Here&rsquo;s what a CI script looks like when it goes beyond just running tests. This is the <code>bin/ci</code> from the same Rails project:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-ruby" data-lang="ruby"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># config/ci.rb - run with bin/ci</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#f9e2af">CI</span><span style="color:#89dceb;font-weight:bold">.</span>run <span style="color:#cba6f7">do</span>
</span></span><span style="display:flex;"><span>  step <span style="color:#a6e3a1">&#34;Setup&#34;</span>, <span style="color:#a6e3a1">&#34;bin/setup --skip-server&#34;</span>
</span></span><span style="display:flex;"><span>  step <span style="color:#a6e3a1">&#34;Style: Ruby&#34;</span>, <span style="color:#a6e3a1">&#34;bin/rubocop&#34;</span>
</span></span><span style="display:flex;"><span>  step <span style="color:#a6e3a1">&#34;Security: Gem audit&#34;</span>, <span style="color:#a6e3a1">&#34;bin/bundler-audit&#34;</span>
</span></span><span style="display:flex;"><span>  step <span style="color:#a6e3a1">&#34;Security: Importmap vulnerability audit&#34;</span>, <span style="color:#a6e3a1">&#34;bin/importmap audit&#34;</span>
</span></span><span style="display:flex;"><span>  step <span style="color:#a6e3a1">&#34;Security: Brakeman code analysis&#34;</span>, <span style="color:#a6e3a1">&#34;bin/brakeman --quiet --no-pager --exit-on-warn --exit-on-error&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">end</span>
</span></span></code></pre></div><p>Five steps, each giving the agent a different kind of feedback. RuboCop catches style violations. Bundler-audit catches vulnerable gems. Brakeman catches security issues in the code itself. An agent that runs <code>bin/ci</code> gets five signals instead of one.</p>
<h3 id="browser-access-as-a-feedback-loop">Browser access as a feedback loop</h3>
<p>For web applications, there&rsquo;s a feedback loop that most teams overlook: giving agents the ability to see what they built.</p>
<p>An agent that can only run tests is working blind on anything visual. It can verify that a controller returns 200, but it can&rsquo;t tell whether the page actually renders correctly, whether a modal opens, or whether a form submits without errors. Cursor&rsquo;s team <a href="https://cursor.com/blog/agent-computer-use">wrote about this</a>: once they gave agents browser access via cloud sandboxes, agents could &ldquo;iterate until they&rsquo;ve validated their output rather than handing off the first attempt.&rdquo; More than 30% of their merged PRs are now created by agents operating autonomously in cloud sandboxes.</p>
<p>You don&rsquo;t need a full cloud sandbox to get value from this. Claude Code has <a href="https://code.claude.com/docs/en/chrome">built-in Chrome support</a> via <code>claude --chrome</code>, and tools like Playwright MCP give agents browser control locally. The agent can navigate to a page, take a snapshot of the DOM, fill in a form, and verify the result. That&rsquo;s a feedback loop that catches an entire class of issues that unit tests and linters never will.</p>
<h3 id="the-smallest-change-that-moves-the-needle-3">The smallest change that moves the needle</h3>
<p>Add a linter to your CI pipeline. For a Ruby project, that&rsquo;s RuboCop. For JavaScript/TypeScript, ESLint. For Python, Ruff. One config file, one CI step. The agent immediately starts getting feedback on style and conventions that it wouldn&rsquo;t otherwise know about.</p>
<p>If you want faster feedback, add pre-commit hooks. The agent runs into the linter before it even pushes, which means it fixes issues in the same context window where it created them. That&rsquo;s cheaper, faster, and produces cleaner commits.</p>
<p>For web projects, consider adding browser access through Playwright MCP or a similar tool. The agent starts verifying its own UI changes instead of relying on you to catch visual issues in review.</p>
<h2 id="where-to-start">Where to Start</h2>
<p>If you&rsquo;re looking at your codebase and wondering where to start, here&rsquo;s how I think about prioritization:</p>
<ol>
<li><strong>Fix your test foundation first.</strong> Without reliable tests, every other improvement is hard to verify. An agent can&rsquo;t confidently refactor your architecture if there&rsquo;s no test suite to catch regressions.</li>
<li><strong>Add an AGENTS.md.</strong> This is 30 minutes of work that immediately changes agent behavior. It&rsquo;s the highest-ROI improvement you can make.</li>
<li><strong>Add a linter to CI.</strong> This closes the feedback gap with minimal effort. The agent starts learning your conventions from automated feedback instead of guessing from code patterns.</li>
</ol>
<p>These three changes don&rsquo;t require a major initiative. They&rsquo;re individual tasks that compound. A codebase with strong tests, clear documentation, and fast feedback loops creates a reinforcing cycle: agents produce better code, which maintains the patterns, which makes future agent output even better.</p>
<p>If you want to see where your codebase stands across all eight dimensions, run the <a href="/codebase-readiness/">Codebase Readiness Assessment</a>. It takes 60 seconds and gives you a score, a per-dimension breakdown, and a prioritized roadmap.</p>
<p>If your team wants hands-on help closing these gaps, that&rsquo;s what the <a href="/services/ai-enablement/">AI Workflow Enablement program</a> is built for. Or if you just want to talk through your results, <a href="/pages/meet/">book a free intro call</a>.</p>
<h2 id="further-reading">Further Reading</h2>
<ul>
<li><a href="/codebase-readiness/">Codebase Readiness Assessment</a> - Run the free assessment on your repo</li>
<li><a href="https://openai.com/index/harness-engineering/">Harness Engineering: Leveraging Codex in an Agent-First World</a> - OpenAI&rsquo;s deep dive on building a million-line codebase entirely with agents</li>
<li><a href="https://cursor.com/blog/agent-computer-use">Agent Computer Use</a> - How Cursor gives agents browser access to verify their own work</li>
<li><a href="/posts/2025-11-25-how-i-use-claude-code/">How I Use Claude Code: My Complete Development Workflow</a> - How codebase structure impacts agent output quality</li>
<li><a href="/posts/2026-02-05-mcps-vs-agent-skills/">MCPs vs Agent Skills</a> - Architecture decisions that shape how agents interact with your codebase</li>
</ul>
]]></content:encoded></item><item><title>Building a Linear-Driven Agent Loop with Claude Code</title><link>https://www.damiangalarza.com/posts/2026-02-13-linear-agent-loop/</link><pubDate>Fri, 13 Feb 2026 00:00:00 -0500</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/posts/2026-02-13-linear-agent-loop/</guid><description>How I built a bash-based agent loop that pulls work from Linear, implements features, runs code review, and opens pull requests autonomously.</description><content:encoded><![CDATA[<p>In December, the developer community on X was buzzing about Ralph Wiggum. If you missed it, Anthropic&rsquo;s Claude Code plugins had a plugin called <a href="https://github.com/anthropics/claude-code/tree/main/plugins/ralph-wiggum">Ralph Wiggum</a>. In the README it&rsquo;s described as:</p>
<blockquote>
<p>Ralph is a development methodology based on continuous AI agent loops. As Geoffrey Huntley describes it: &ldquo;Ralph is a Bash loop&rdquo; - a simple while true that repeatedly feeds an AI agent a prompt file, allowing it to iteratively improve its work until completion.</p></blockquote>
<p>This was used in a variety of ways. Two common ones were:</p>
<ol>
<li>Unleash an agent to work on a single task on its own until it was done.</li>
<li>Unleash an agent to iterate through a backlog of work until it had completed all of it.</li>
</ol>
<p>Today we&rsquo;re going to explore the second one, using an agent loop to iterate through a project backlog.</p>
<h2 id="where-ralph-wiggum-falls-flat">Where Ralph Wiggum Falls Flat</h2>
<p>The Ralph Wiggum plugin provides a command you call inside Claude Code. The session continues until a set of requirements have been met, at which point the loop exits. For example:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>/ralph-loop <span style="color:#a6e3a1">&#34;Build a REST API for todos. Requirements: CRUD operations, input validation, bin/rails test and bin/rails lint must pass. Output &lt;promise&gt;COMPLETE&lt;/promise&gt; when done.&#34;</span>
</span></span></code></pre></div><p>There is a drawback to this approach though. Running the loop inside of a Claude Code session means we&rsquo;re eating away at our context window. If you&rsquo;ve read my blog post on <a href="/posts/2025-12-08-understanding-claude-code-context-window">Understanding Claude Code&rsquo;s Context Window</a> then you know that this can cause poor results as time goes on. This becomes exponentially worse if you are trying to loop through multiple pieces of work. The agent&rsquo;s context window will be subject to context rot as different streams of work are worked on.</p>
<p>There is a solution though.</p>
<h2 id="bash-loops">Bash Loops</h2>
<p>Instead of running a Ralph Wiggum loop inside of the Claude Code instance, we can loop inside bash. In this version every iteration of the loop starts with a fresh context window, avoiding issues with context rot. This works via the <code>--dangerously-skip-permissions</code> flag, which allows Claude Code to run non-interactively without prompting for tool approvals. An example loop looks something like:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#cba6f7">while</span> true; <span style="color:#cba6f7">do</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f5e0dc">SESSION</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#cba6f7">$((</span>SESSION <span style="color:#89dceb;font-weight:bold">+</span> <span style="color:#fab387">1</span><span style="color:#cba6f7">))</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f5e0dc">TIMESTAMP</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#cba6f7">$(</span>date +%Y%m%d_%H%M%S<span style="color:#cba6f7">)</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f5e0dc">COMMIT</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#cba6f7">$(</span>git rev-parse --short<span style="color:#89dceb;font-weight:bold">=</span><span style="color:#fab387">6</span> HEAD 2&gt;/dev/null <span style="color:#89dceb;font-weight:bold">||</span> <span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;no-git&#34;</span><span style="color:#cba6f7">)</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f5e0dc">LOGFILE</span><span style="color:#89dceb;font-weight:bold">=</span><span style="color:#a6e3a1">&#34;</span><span style="color:#a6e3a1">${</span><span style="color:#f5e0dc">LOG_DIR</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1">/</span><span style="color:#a6e3a1">${</span><span style="color:#f5e0dc">AGENT_NAME</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1">_</span><span style="color:#a6e3a1">${</span><span style="color:#f5e0dc">TIMESTAMP</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1">_</span><span style="color:#a6e3a1">${</span><span style="color:#f5e0dc">COMMIT</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1">.log&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;--- Session #</span><span style="color:#a6e3a1">${</span><span style="color:#f5e0dc">SESSION</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1"> starting at </span><span style="color:#cba6f7">$(</span>date<span style="color:#cba6f7">)</span><span style="color:#a6e3a1"> ---&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;    Log: </span><span style="color:#f5e0dc">$LOGFILE</span><span style="color:#a6e3a1">&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  claude --dangerously-skip-permissions <span style="color:#89b4fa">\
</span></span></span><span style="display:flex;"><span><span style="color:#89b4fa"></span>    -p <span style="color:#a6e3a1">&#34;</span><span style="color:#cba6f7">$(</span>cat <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$PROMPT_FILE</span><span style="color:#a6e3a1">&#34;</span><span style="color:#cba6f7">)</span><span style="color:#a6e3a1">&#34;</span> <span style="color:#89b4fa">\
</span></span></span><span style="display:flex;"><span><span style="color:#89b4fa"></span>    --model <span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$MODEL</span><span style="color:#a6e3a1">&#34;</span> <span style="color:#89b4fa">\
</span></span></span><span style="display:flex;"><span><span style="color:#89b4fa"></span>    &amp;&gt;<span style="color:#a6e3a1">&#34;</span><span style="color:#f5e0dc">$LOGFILE</span><span style="color:#a6e3a1">&#34;</span> <span style="color:#89dceb;font-weight:bold">||</span> <span style="color:#89dceb">true</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;    Session #</span><span style="color:#a6e3a1">${</span><span style="color:#f5e0dc">SESSION</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1"> ended at </span><span style="color:#cba6f7">$(</span>date<span style="color:#cba6f7">)</span><span style="color:#a6e3a1">&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#89dceb">echo</span> <span style="color:#a6e3a1">&#34;&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>  <span style="color:#6c7086;font-style:italic"># Brief pause between sessions to avoid hammering if something is broken</span>
</span></span><span style="display:flex;"><span>  sleep <span style="color:#fab387">5</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">done</span>
</span></span></code></pre></div><p>The <code>$PROMPT_FILE</code> is where the real work gets defined. It&rsquo;s a markdown file that tells the agent exactly what to do during each session. Mine walks the agent through a full lifecycle: orient itself on the project, pick up the next issue from Linear, build the feature, run a code review with subagents, and open a pull request. It also includes guardrails like one issue per session, never break main, and what to do if blocked or stuck for more than 15 minutes.</p>
<p>Let&rsquo;s walk through how each of these pieces works in practice.</p>
<h2 id="how-it-all-fits-together">How It All Fits Together</h2>
<p>I decided to give this a try on my recent project CreatorSignal that I&rsquo;ve been building during my <a href="https://www.youtube.com/@damian.galarza/streams">live streams</a>. While I&rsquo;ve seen many people maintaining their backlogs in markdown files or custom Kanban board experiences within Claude Code, I prefer using <a href="https://linear.app/">Linear</a>. I didn&rsquo;t want to recreate a task management system just for the agent loop. With the <a href="https://linear.app/docs/mcp">Linear MCP</a> in hand, here&rsquo;s how I set it up.</p>
<h3 id="progressmd">PROGRESS.md</h3>
<p>One of the core pieces is the <code>PROGRESS.md</code> file. While the individual tasks are tracked and maintained in Linear, this file is meant to serve as a sort of &ldquo;memory&rdquo; for the agents to understand what has been accomplished from a more holistic level. At the start of each loop, the <code>PROGRESS.md</code> file is read in. At the end of a loop, the agent writes to it what it has accomplished.</p>
<p>Example:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-markdown" data-lang="markdown"><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"># Progress
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span>
</span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold">## 2026-02-13
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span>
</span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold">### PRX-27: Billing portal (Stripe Customer Portal integration) — DONE
</span></span></span><span style="display:flex;"><span><span style="color:#fab387;font-weight:bold"></span><span style="color:#cba6f7">-</span> Created <span style="color:#a6e3a1">`BillingPortalController`</span> with <span style="color:#a6e3a1">`show`</span> and <span style="color:#a6e3a1">`create`</span> actions
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> Billing page displays current plan, price, next billing date
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> &#34;Manage Subscription&#34; button creates Stripe BillingPortal::Session and redirects
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> Free users see upgrade CTA; former subscribers can still access portal for invoices
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> Cancellation pending state shown with reactivation option
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> 11 request specs + 6 system specs, all passing (266 total)
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> PR: https://github.com/dgalarza/CreatorSignal/pull/31
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">-</span> Branch based on PRX-25 (chain: PRX-17 → PRX-23 → PRX-24 → PRX-25 → PRX-27)
</span></span></code></pre></div><h3 id="implementing-an-issue">Implementing an Issue</h3>
<p>Using the Linear MCP, the agent finds the next highest priority issue to work on. It starts by looking at the &ldquo;Todo&rdquo; column and picks the next one up. If there&rsquo;s nothing in Todo, it checks the backlog instead. From there it reads the issue&rsquo;s details to understand the work that needs to be done. For the loop to work well, issues need to be spec&rsquo;d out thoroughly. This gives the agent the highest chance of performing quality work without human supervision.</p>
<p>With an issue selected, the agent moves it to &ldquo;In Progress&rdquo;, creates a branch, and starts building. A task is not considered &ldquo;done&rdquo; unless the test suite and linters both pass. This is another critical piece for a successful agent loop. The agent must have solid ways of verifying its own work. Without automated checks, it&rsquo;s difficult for the agent to understand success, and quality drops.</p>
<p>When the agent believes its work is ready, it comments on the Linear issue with a summary of what it built and moves the issue to &ldquo;In Review&rdquo;.</p>
<h3 id="code-review">Code Review</h3>
<p>Similar to my workflow described in <a href="/posts/2025-11-25-how-i-use-claude-code">How I Use Claude Code</a>, the next step is to spawn subagents to perform code review. The agent uses the <code>Task</code> tool to spin up a reviewer that evaluates the diff against the issue requirements, checking for correctness, test quality, Rails conventions, security, and performance.</p>
<p>The review is posted as a comment on the Linear issue. This provides visibility into the full lifecycle of the work. I can see the main agent&rsquo;s implementation summary alongside the code review feedback. The agent then resolves any feedback it received and posts a final comment on the Linear issue summarizing its decisions.</p>
<h3 id="pull-request">Pull Request</h3>
<p>After the code review process is complete and feedback is addressed, the agent commits the work and opens a pull request. The Linear issue is moved to &ldquo;Done&rdquo;, and the agent writes its progress update to the PROGRESS.md file.</p>
<h3 id="clean-up">Clean Up</h3>
<p>With everything complete, the agent&rsquo;s last instructions are to check out the main branch and rebase against origin/main so that the next loop starts in a fresh state. The loop then exits cleanly. There&rsquo;s a built-in pause after each iteration before the next one starts.</p>
<h3 id="visibility">Visibility</h3>
<p>This loop proved to work well. I connected Slack to my Linear project so I could see notifications coming in as the agent worked through issues. Each time an issue had its status updated, each time an agent completed its work, and each time an agent received and addressed review feedback, I could see the progress in real time.</p>
<h2 id="improving-on-the-workflow">Improving on the Workflow</h2>
<p>While this initial pass at a loop was working well, I had some things I wanted to improve. First, as pull requests were getting opened and merged, some would end up becoming stale with merge conflicts given the speed at which new features were landing. Second, I wanted to be able to leave feedback on a pull request as if I was working with a team member and have it get addressed by the agent as part of the loop.</p>
<p>I solved this by adding a new step to the loop as follows.</p>
<p>Before picking up a new task, the agent runs <code>bin/pr_check</code>. This script looks through my open pull requests for any with the &ldquo;needs-revision&rdquo; label. If none need review feedback addressed, it checks for any that have gone stale with merge conflicts.</p>
<p>If a PR like this is found, the loop addresses one PR leaving the next for the next loop iteration. So whenever I had a PR that I felt had feedback I wanted addressed, I would leave comments on it and add the &ldquo;needs-revision&rdquo; label. The next time a loop happens the agent will pick it up and address the feedback.</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># bin/pr_check</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Finds the first open PR that needs attention.</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Returns JSON with PR details if one needs work, or empty output if all clean.</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># A PR &#34;needs attention&#34; if:</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#   1. It has merge conflicts (mergeableStatus == CONFLICTING)</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#   2. It has the &#34;needs-revision&#34; label</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Usage:</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#   bin/pr_check           # returns JSON or empty</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#   bin/pr_check --quiet   # exit code only (0 = needs attention, 1 = all clean)</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Output format:</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#   {</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#     &#34;number&#34;: 42,</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#     &#34;branch&#34;: &#34;damian/prx-7-exa-research-tools&#34;,</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#     &#34;title&#34;: &#34;PRX-7: Exa research tools&#34;,</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#     &#34;url&#34;: &#34;https://github.com/...&#34;,</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#     &#34;reason&#34;: &#34;has_feedback&#34;,    # or &#34;conflicting&#34; or &#34;conflicting,has_feedback&#34;</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#     &#34;conflicting&#34;: true,</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#     &#34;has_feedback&#34;: true</span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic">#   }</span>
</span></span></code></pre></div><p>The loop itself is about a 100 line bash script. I&rsquo;ll be adding it to my Claude Code workflows this week and sharing it with my newsletter.</p>
<h2 id="what-makes-this-work">What Makes This Work</h2>
<p>After running this loop across several sessions, a few things stand out as critical to getting quality results:</p>
<ol>
<li><strong>Fresh context per iteration.</strong> Running the loop in bash instead of inside a Claude Code session means each task gets a clean context window. This is the single biggest difference from the Ralph Wiggum approach.</li>
<li><strong>Well-spec&rsquo;d issues.</strong> The agent is only as good as the instructions it receives. Vague issues produce vague results. Detailed acceptance criteria and clear scope make all the difference.</li>
<li><strong>Automated verification.</strong> Requiring passing tests and linters before a task is considered &ldquo;done&rdquo; gives the agent a concrete definition of success. Without this, quality drops fast.</li>
<li><strong>Linear as the source of truth.</strong> Using an existing project management tool instead of reinventing one means I can see the full lifecycle of every issue, from backlog to done, with comments and status updates along the way.</li>
</ol>
<p>The combination of these pieces turns what could be a chaotic autonomous loop into something that produces reviewable, mergeable work. It&rsquo;s not perfect, and I still review every pull request before merging, but the amount of ground it covers between review cycles is significant.</p>
<h2 id="additional-reading">Additional Reading</h2>
<ul>
<li><a href="/posts/2026-02-05-mcps-vs-agent-skills/">MCPs vs Agent Skills: Understanding the Difference</a> - The agent loop relies on the Linear MCP as its backbone. This post covers how MCPs and skills serve different roles in your workflow.</li>
<li><a href="/posts/2025-12-08-understanding-claude-code-context-window/">Understanding Claude Code&rsquo;s Context Window</a> - A deep dive into how the context window works and why fresh context per iteration is so important.</li>
<li><a href="https://youtu.be/Seu7nksZ_4k">How AI Agents Remember Things</a> - The PROGRESS.md pattern is essentially agent memory between loop iterations. This video covers how agents persist context across sessions.</li>
<li><a href="https://youtu.be/tO_Larrawfg">MCPs vs Skills: The Mental Model You&rsquo;re Missing</a> - The video companion to the blog post above, covering the architectural distinction between MCPs and skills.</li>
<li><a href="https://www.youtube.com/playlist?list=PLeevcUmnIRCy8XirmTSbHz71hs31idVC3">Building CreatorSignal</a> - The livestream series where I&rsquo;m building CreatorSignal, the project this agent loop runs against.</li>
</ul>
<p>If you haven&rsquo;t already, sign up for my newsletter for weekly emails on AI Engineering and agentic development workflows.</p>
<hr>
<p>If you&rsquo;re building agent loops or autonomous workflows and want help getting the architecture right, I work with teams on exactly this. <a href="/ai-agents/">Let&rsquo;s talk</a>.</p>
]]></content:encoded></item></channel></rss>