Developer-Productivity on Damian Galarza | Software Engineering & AI Consulting

Claude Opus 4/7 + Claude Code: 7 Practical Tips for Maximizing Extended Context

Damian Galarza — Thu, 30 Apr 2026 00:00:00 -0400

Claude Opus 4.7 shipped with a 1M token context window. That’s five times what Sonnet 4.5 offers. However, this doesn’t mean we should no longer be careful with our context window.

The lost-in-the-middle problem doesn’t disappear at 1M tokens. Content in the center of the window still gets less attention than content at the beginning and end. Opus 4.7 uses a new tokenizer that improves model performance, but it also means files you read consume context in subtly different ratios than before. Anthropic’s docs note the new tokenizer can use up to 35% more tokens per equivalent input compared to previous models. And adaptive thinking, now the only supported thinking mode in 4.7 (fixed budgets are removed), consumes context dynamically. The model thinks longer on harder problems and shorter on easy ones. That thinking counts against your window.

If you’re coming from my earlier post on Understanding Claude Code’s Context Window, everything there still applies. The fundamentals haven’t changed. What has changed is the ceiling, and the set of controls available to you.

Here are seven workflow adjustments I’ve made since Opus 4.7 dropped. Each one addresses a specific constraint I hit in daily production use.

1. Front-Load Context in Your First Turn

One of the big changes from Opus 4.6 to Opus 4.7 is that it no longer reads between the lines. Opus 4.6 was better at taking a vague prompt and “figuring it out”. Opus 4.7, however, no longer does this. You need to provide good context to the model to achieve good results. The first message in the session anchors everything that follows.

Structure your first turn to include three things: what you want and why, which files or areas of the codebase are relevant, and what “done” looks like.

Here’s an example. Instead of this:

Add rate limiting to the API

Try this:

Add rate limiting to the webhook ingestion endpoint in
packages/gateway/src/routes/webhooks.ts. We're getting
hammered by a misbehaving integration that sends duplicate
events. Use the existing Redis connection in src/lib/redis.ts.
Rate limit by client IP, 100 requests per minute. Add tests
in __tests__/webhooks.test.ts. Don't change the event
processing logic in src/lib/event-handler.ts.

The second version tells Opus 4.7 exactly what to touch, why, and what to leave alone. You define the “what” and the constraints. Let the model propose the “how.”

One thing to watch for: don’t turn your first message into a specification document. If you find yourself writing more than a paragraph or two, you’re probably trying to control implementation details that the model should decide. Name the files, the constraints, and the definition of done. Stop there.

2. Switch Effort Levels Mid-Session

Thinking tokens count against your context window. A single xhigh response on a complex problem can use significantly more tokens than the same question at high. Over the course of a session, this adds up fast.

Opus 4.7 introduced xhigh effort and replaced the old fixed thinking budgets with adaptive thinking. At xhigh, the model almost always engages deep reasoning on complex work and skips thinking on simpler tasks. That’s useful for architecture decisions, complex debugging, and multi-file refactors. It’s overkill for renaming a variable across twenty files.

Here’s how I handle it. I start sessions at xhigh for the initial planning and implementation work. When I shift to mechanical tasks, I drop the effort level:

/effort high

Rename the files, run the migration, update the imports. Then when I need deep analysis again:

/effort xhigh

In practice: you spend the first part of a session at xhigh implementing a feature, then need to update some test fixtures and rename a few constants. Drop to high or even medium for that work. When you’re ready to debug a failing integration test, go back to xhigh.

The gotcha here is context switching cost. Don’t toggle effort every other message. Batch your mechanical tasks together and run them at a lower effort level in one block. Then switch back for the next piece of deep work.

3. Compact at 60%, Not When You See a Warning

Autocompact triggers when your context window is nearly full. By the time that happens with a 1M window, you’ve been running with degraded output quality for a while. The lost-in-the-middle effect doesn’t wait for you to run out of room. It starts affecting responses well before you hit the ceiling.

My rule of thumb: check /context periodically and compact when you hit around 60%. That sounds like a lot to throw away, but consider the flip side. You still have 400K tokens after compacting, which is twice the entire Sonnet 4.5 window.

Here’s what /context output looks like in a session approaching that threshold:

Context Usage
⛁ ⛀ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁   claude-opus-4-7 · 610k/1000k tokens (61%)

After a proactive compact:

Context Usage
⛁ ⛀ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁   claude-opus-4-7 · 85k/1000k tokens (8.5%)

That’s a fresh start with all the important decisions preserved. Much better than letting autocompact fire at capacity and losing coherence.

The trade-off with early compaction is that you lose conversational nuance. Specific phrasings, detailed file contents, and turn-by-turn reasoning all get compressed into a summary. This is why Tip 4 exists.

4. Steer Your Compaction

Running /compact without guidance lets the model decide what to keep and what to drop. This works reasonably well for short sessions, but in a long session with multiple decisions, the model often drops specifics that matter for the next phase of work.

Always pass steering instructions when you compact. Name the topics and the decisions you need preserved.

Here are three examples from real sessions:

/compact Preserve the auth refactor decisions: we chose
JWT with rotating refresh tokens over session cookies,
the token service is in src/lib/auth/tokens.ts, and
the migration adds a refresh_tokens table.

/compact Keep the schema changes we made to the proposals
table (added status enum, soft delete columns, and the
client_id foreign key). Preserve the repo pattern decision
from packages/shared/src/db/repos/proposals.ts.

/compact We're moving to phase 2 of the API implementation.
Preserve the route structure decisions (REST for CRUD,
webhooks for async events) and the middleware chain order.
Drop the debugging of the TypeScript config issues.

Keep your steering to two or three sentences. Name the topics, not every detail. The model will fill in the specifics from the conversation history. You are giving it a priority list, not writing the summary yourself.

5. Use Subagents for Context Isolation

I covered subagents in detail in my context window post, but Opus 4.7 shifts the default behavior. In my experience, Opus 4.7 spawns fewer subagents on its own compared to earlier models (Anthropic’s release notes confirm this as a deliberate behavior change). It’s more inclined to do work inline, which means exploration output that used to be isolated now accumulates in your main context.

That’s fine for focused tasks. It becomes a problem when you need to explore a large area of the codebase or review a significant diff. The fix: explicitly request subagent delegation.

The key is scoping what comes back. Instead of:

Review the changes on this branch

Try:

Have a subagent review the changes on this branch against
main. Report back: any bugs, any missing test coverage,
and any patterns that don't match our existing conventions.
Don't include the full diff in the report.

Good candidates for subagent delegation:

Code reviews: The subagent reads every changed file, but your main context only gets the summary.
Codebase exploration: “Have a subagent map out how the notification system works across packages/gateway and packages/agents.”
Test analysis: “Spawn a subagent to check which tests cover the payment flow and identify gaps.”
Pattern audits: “Use a subagent to find all places we handle errors in route handlers and check for consistency.”

The gotcha with subagents is that they don’t share your conversation history. If you made a decision earlier in the session that affects how the subagent should evaluate something, include that decision in the delegation prompt. The subagent starts fresh.

6. Use Rewind to Recover from Failed Approaches

Every failed approach leaves artifacts in your context: the wrong implementation, the correction, the explanation of why it was wrong. With Opus 4.7’s literal instruction following, this creates a real problem. The model may anchor on parts of a failed attempt even after you have corrected course, because that failed code is still in the conversation history.

The /rewind command (or double-tap Escape) rolls back to a previous point in the conversation. This removes the failed approach from context entirely, as if it never happened.

Here’s when to use rewind versus inline correction:

Rewind when the approach is fundamentally wrong. You asked for a webhook handler and got a giant switch statement, but your codebase uses an event routing pattern. Correcting inline means the model has both patterns in context and may blend them.

Correct inline when the details need adjustment. The approach is right but a method name is wrong, or it missed an edge case. The cost of the correction in context is low, and the model benefits from seeing the refinement.

A practical example: I asked Claude to implement a notification dispatch system. The first attempt built a synchronous pipeline. My codebase uses BullMQ for async job processing. Rather than explaining why synchronous was wrong and asking it to redo the work, which would leave both approaches in context, I rewound and rephrased:

Implement notification dispatch using our existing BullMQ
job infrastructure in packages/agents/src/lib/queue.ts.
Each notification type gets its own job processor.
Follow the pattern in the heartbeat-runner for job setup.

Clean context. Clear direction. No conflicting signals.

One warning: rewind is destructive. If the failed approach contained useful insights (it identified the right files to modify, or surfaced a constraint you hadn’t considered), note those before rewinding. You can include them in your rephrased prompt.

7. Know When to Clear, Compact, or Continue

Quality degrades gradually in long sessions. You won’t see a cliff. Responses get slightly less precise, slightly more generic, slightly more likely to miss constraints you established earlier. A 1M window means sessions can run much longer, which makes the decision of when to stop harder, not easier.

Here’s the decision framework I use:

Continue when you’re mid-task, below 60% context usage, and working on a single coherent thread. The model has strong recall of recent decisions and the work is flowing.

Compact when you’ve finished a phase and are starting the next one. You need the architectural decisions but not the turn-by-turn implementation details. This is where Tip 4’s steering instructions matter most.

Clear when the next task is unrelated to what you’ve been doing. Also clear when the model starts repeating itself, when you’ve already compacted multiple times in the session, or when you’ve persisted your plan externally (in a TODO file, a Linear issue, or a CLAUDE.md update).

Start a new session when you need different MCP servers, when you’re switching to a different branch, or when you’re doing parallel worktree work. Each worktree should get its own session. I covered why in Extending Claude Code with Worktrees for True Database Isolation.

The full session lifecycle follows a natural arc. Start with a strong first prompt (Tip 1) at xhigh effort (Tip 2). During the working phase, delegate exploration to subagents (Tip 5) and rewind failed approaches (Tip 6). When you hit around 60% context, compact proactively (Tip 3) with steering instructions (Tip 4). Then decide whether to continue, clear, or start fresh (Tip 7).

Session Start
  ├── Tip 1: Front-load context in first turn
  ├── Tip 2: xhigh for deep work, high/medium for mechanical
  │
  │   [Working...]
  │
  ├── Tip 5: Delegate exploration to subagents
  ├── Tip 6: Rewind failed approaches
  │
  │   [~60% context used]
  │
  ├── Tip 3: Proactive /compact
  ├── Tip 4: Steer the compaction
  │
  │   [Continue or...]
  │
  └── Tip 7: Clear / New session when needed

The Mental Model

The 1M context window isn’t five times more room. It’s five times more rope.

With a 200K window, context pressure forced discipline. You had to be deliberate about what went into the window because you would run out. With 1M tokens, poor habits go unnoticed much longer before the consequences show up. That makes discipline harder, not easier.

The one principle behind all seven tips: active context management beats passive accumulation. Front-load your intent. Control your effort levels. Compact before you need to. Steer the compaction. Isolate expensive exploration. Remove dead ends. Know when to stop.

These aren’t theoretical suggestions. They’re the adjustments I’ve made in my own workflow over the past week of daily Opus 4.7 usage. It rewards precision and punishes ambiguity. Give it clear context, and it delivers.

If this post was the explanation, the cheat sheet is the reference. Two sides: token costs for common MCPs on one, the /clear / /compact / subagent decision tree on the other.

Get the Context Window Cheat Sheet →

Four Dimensions of Agent-Ready Codebase Design

Damian Galarza — Wed, 25 Mar 2026 00:00:00 -0400

When an AI agent rewrites a file and the result doesn’t match your conventions, the first move is usually to adjust the prompt. Try different instructions. Add more context to the message. Maybe switch models.

The model is rarely the bottleneck. The codebase is.

The same model, pointed at a codebase with strong tests, clear architecture, and good documentation, produces remarkably consistent output. Point it at a codebase with weak coverage, no architecture docs, and no linting, and you get drift. Not because the model is less capable, but because it has less to work with.

I built the Codebase Readiness Assessment to make this measurable. It scores your repo across eight dimensions on a 0-100 scale. But you don’t need to run the assessment to understand what separates high-scoring codebases from low-scoring ones. Four dimensions account for most of the gap.

Test Foundation

Test foundation carries the most weight in the assessment (25%) because it’s the single biggest lever for agent output quality.

What a low score looks like

An agent makes a change. There are no tests covering that area, so it moves on. The change compiles, maybe even runs, but it broke an assumption three modules away. Nobody finds out until a human reviews the PR, or worse, until production.

I’ve seen this repeatedly: teams with 30-40% test coverage ask an agent to refactor a service object. The agent produces clean code that looks right. But there’s no spec for the edge case where a nil association triggers a downstream error. The agent had no way to catch it because there’s no test to fail.

The other failure mode is slow tests. If your suite takes 20 minutes, the agent can’t iterate. It makes a change, waits, discovers the failure, tries again, waits again. In a fast suite, that feedback cycle takes seconds. In a slow one, the agent burns time and money waiting for results.

What a high score looks like

Codebases that score well here share a few characteristics:

Coverage above 70% on critical paths. Not 100% everywhere, but thorough coverage on the code that matters: domain logic, service objects, API endpoints. The agent can make changes and get immediate confirmation that nothing broke.
Suite runs in under 5 minutes. Fast enough that the agent can run tests after every meaningful change, not just at the end.
Deterministic results. No flaky tests. When the suite says green, it means green. Agents can’t distinguish between a flaky failure and a real one, so flaky tests teach agents to ignore failures.

Don’t stop at unit tests

Unit tests on service objects and models are the foundation, but they only verify isolated behavior. An agent that passes all unit tests can still break a user-facing workflow that spans multiple components.

End-to-end tests give agents confidence across entire flows. A system spec that signs a user in, submits a form, and checks the result tells the agent whether the feature works, not just whether a method returns the right value. This is especially valuable when agents make changes that touch controllers, views, and services in the same PR.

Here’s a simplified system spec from one of my Rails projects. It covers the core user journey: signing in and submitting a video idea for validation.

# spec/system/idea_submission_spec.rb

RSpec.describe "Idea submission" do
  it "allows a signed-in user to submit a video idea" do
    user = create(:user)

    sign_in_as(user, path: new_idea_path)

    select user.channels.first.name, from: "Channel"
    fill_in "Title", with: "Building a Rails AI Agent from Scratch"
    fill_in "Description", with: "Step-by-step tutorial on building an AI agent"
    fill_in "Category", with: "AI Coding"
    click_button "Validate Idea"

    expect(page).to have_content("Building a Rails AI Agent from Scratch")
  end
end

This test touches authentication, the form UI, the controller, the background job, and the results page. If an agent breaks any part of that chain, this spec catches it.

The tradeoff is speed. End-to-end tests are slower and more brittle than unit tests. You don’t need full E2E coverage, but having system specs on your critical user journeys (signup, checkout, the core action your product is built around) gives agents a safety net that unit tests alone can’t provide.

The smallest change that moves the needle

Add coverage to your critical paths first. Don’t chase a coverage number. Instead, identify the three or four service objects or domain models where bugs would hurt the most, and write specs for those. Then add one or two system specs covering your most important user journeys end-to-end. If your suite is slow, add parallel test execution. In a Rails app, that might be as simple as adding the parallel_tests gem. A suite that goes from 15 minutes to 4 minutes fundamentally changes how an agent can work with your code. If you’re running multiple agents in parallel, you’ll also need database isolation per worktree to prevent test data collisions.

If you want to accelerate the process, tools like autoresearch apply this pattern as an autonomous loop: give the agent a measurable goal (like a coverage target), and it iterates, verifies, keeps what works, and discards what doesn’t.

Documentation as Code

Documentation carries 15% of the assessment weight, but in practice it’s the dimension where I see the biggest gap between teams that get good agent output and teams that don’t.

What a low score looks like

Without an agent-facing entry point (a CLAUDE.md, AGENTS.md, or equivalent), an agent has to reverse-engineer your conventions from the code itself. It reads your files, infers patterns, and guesses at intent. Sometimes it guesses right. Often it doesn’t.

Here’s a concrete example. A Rails app uses service objects for all business logic. Controllers call a service, the service does the work, and the result gets rendered. There’s nothing enforcing this in the framework. It’s a team convention. An agent that doesn’t know about this convention puts the logic directly in the controller action. The code works. The tests pass. But it breaks the team’s pattern, and now there’s a 50-line controller action that should have been a service object.

The agent wasn’t wrong. It had no way to know.

What a high score looks like

The key insight is that this entry point file should be a map, not a manual. OpenAI’s Harness Engineering team learned this the hard way: they tried a single large instruction file and it failed because “context is a scarce resource” and “too much guidance becomes non-guidance.” When everything is marked important, agents pattern-match locally instead of navigating intentionally.

Their solution: keep the entry file short (roughly 100 lines) and treat it as a table of contents that points to deeper sources of truth in a structured docs/ directory. The entry file gives agents quick commands and a documentation map. The detail lives in dedicated files the agent reads when it needs them. Whether you call it CLAUDE.md, AGENTS.md, or CURSOR.md, the pattern is the same.

Here’s what this looks like in practice from one of my Rails projects:

## Quick Commands

bin/dev                                # Start dev server
bin/rails spec                         # All tests
bin/ci                                 # Full CI: lint + security + tests
bin/rubocop                            # Lint
bin/brakeman                           # Security scan

## Documentation Map

| Topic | Document |
|-------|----------|
| Stack, patterns, domain model | docs/ARCHITECTURE.md |
| Testing patterns and stack | docs/TESTING.md |
| Credentials, env vars, API keys | docs/CONFIGURATION.md |
| Engineering principles | docs/design-docs/core-beliefs.md |
| Architecture decision records | docs/design-docs/ |

The agent gets commands and a map up front. When it needs to understand the domain model or testing conventions, it follows the pointer. This is progressive disclosure: the agent starts with what it needs immediately and loads deeper context on demand.

Here’s a trimmed excerpt from the ARCHITECTURE.md behind that pointer:

## Domain Model

CreatorSignal validates YouTube video ideas. The core flow:

1. User submits a video **Idea**
2. A **Validation** job is enqueued
3. The **ResearchAgent** runs tools against YouTube, Reddit, X, and HN
4. Results are synthesized into a scored **Go / Refine / Kill** verdict

### Key Models

| Model | Responsibility |
|-------|---------------|
| `User` | Authentication, subscription plan |
| `Idea` | A video idea submitted for validation |
| `Validation` | One run of the research agent against an idea |

### Project Structure

app/
├── components/       # ViewComponent components
├── controllers/
├── jobs/             # ActiveJob jobs (async validation)
├── models/
├── services/         # Research agent, tool orchestration
└── views/            # Hotwire (Turbo frames/streams)

An agent reading this knows what an Idea is, that validation is async through a job, and that orchestration logic lives in app/services/. Those are the conventions that prevent drift.

ADRs (Architecture Decision Records) add a layer that documentation alone can’t. An agent that understands why a particular pattern was chosen can make better decisions when extending it. If your ADR says “we chose event sourcing for the billing domain because of auditability requirements,” the agent won’t try to refactor billing into simple CRUD.

The smallest change that moves the needle

Create an AGENTS.md in your project root with two things: commands (build, test, lint) and a documentation map pointing to deeper files. AGENTS.md is an emerging standard supported by Codex, Cursor, Gemini CLI, GitHub Copilot, Windsurf, Devin, and many others. If you’re using Claude Code, symlink CLAUDE.md to it so both resolve to the same file. Then create an ARCHITECTURE.md covering your stack, domain model, and key conventions. This can take an hour and the effect on agent output is immediate. If you want to automate the scaffolding, the agent-ready plugin generates a starting point based on your existing codebase.

Architecture Clarity

Architecture clarity carries 15% of the assessment weight. It measures whether an agent can understand where code belongs and how components relate to each other.

What a low score looks like

Agents replicate patterns they find in the codebase. If your codebase has clear boundaries (controllers handle HTTP, services handle business logic, models handle persistence), the agent follows those boundaries. If your codebase mixes concerns, the agent mixes concerns.

The most common failure I see: a controller that does everything. It validates input, calls the database, sends emails, enqueues jobs. An agent asked to add a new feature looks at the existing controller, sees that’s where logic goes, and adds more logic to the controller. The agent is doing exactly what the codebase taught it to do.

The subtler version is dependency direction. In a well-layered app, dependencies point inward: controllers depend on services, services depend on models. When that direction is inconsistent (models importing from controllers, services reaching into HTTP request objects), agents produce code with the same tangled dependencies.

What a high score looks like

Clear layering. Each layer has a single responsibility, and the codebase is consistent about which layer owns what.
Domain namespacing. Related functionality is grouped by business domain, not just by technical layer. Instead of a flat app/services/ with 40 files, you have app/services/billing/, app/services/onboarding/, app/services/research/. When an agent needs to add billing logic, the namespace tells it exactly where to look and what patterns to follow.
Predictable file organization. A new developer (or agent) can guess where a piece of code lives based on what it does.
Dependency direction is consistent. Inner layers don’t reach outward. You don’t see models importing controller concerns.

Domain namespacing is especially powerful for agents because it constrains the search space. An agent working on a billing feature only needs to understand the billing namespace, not the entire codebase. It finds the existing patterns in that namespace and replicates them. Without namespacing, the agent has to scan the whole codebase to figure out where billing logic lives, and it might find three different patterns in three different places.

The smallest change that moves the needle

If you have fat controllers, extract one. Pick your most complex controller action, pull the business logic into a service object, and write a spec for it. The agent will start using that service object pattern for new features. One well-structured example teaches the agent more than any documentation, because it’s a pattern it can directly replicate.

If your codebase has grown past a handful of services, start namespacing by domain. Group related services, jobs, and models under a shared namespace. This compounds quickly: once you have three or four service objects under Billing::, agents start producing new billing code in the same namespace by default. The codebase becomes self-reinforcing.

Feedback Loops

Feedback loops carry 10% of the assessment weight, but their impact is multiplicative. Good feedback loops make everything else work better. Poor ones make everything else work worse.

What a low score looks like

Agents learn from the signals they get back. When the only signal is “tests passed,” the agent has no way to know it introduced a style violation, broke a naming convention, or used a deprecated API. It moves on, confident the change is correct.

Two things make feedback loops weak: narrow signals and slow signals.

Narrow signals mean the agent only hears from one source. Tests tell the agent whether the code works. They don’t tell it whether the code follows your conventions, whether it introduced a security vulnerability, or whether the UI actually renders correctly. Each missing signal is a category of problems the agent can’t self-correct.

Slow signals are just as damaging. If the agent has to wait 20 minutes for a CI run to discover a linting error, it’s already moved on. It’s built three more features on top of code that doesn’t pass lint. Now you’re unwinding multiple changes instead of catching the first one. The closer the feedback is to the moment of the change, the cheaper it is to fix.

There’s also a hierarchy to how you enforce conventions. Anything that can be checked deterministically by a linter should be a lint rule, not a line in your CLAUDE.md. A lint rule catches every violation, every time. A documentation rule depends on the agent reading it and choosing to follow it. If your convention is “methods must be under 20 lines” or “always use frozen_string_literal,” encode it in RuboCop, ESLint, or whatever linter your stack uses. Save documentation for the things that can’t be mechanically enforced: architectural decisions, domain context, workflow conventions.

What a high score looks like

Pre-commit hooks for immediate feedback. The agent discovers formatting issues, type errors, or lint violations before it even commits.
CI that runs in under 10 minutes. Fast enough that the agent can push, get feedback, and iterate without burning excessive context.
Rich error messages. Linting output that says “method too long (25 lines, max 20)” is actionable. A generic “style violation” is not.

Here’s what a CI script looks like when it goes beyond just running tests. This is the bin/ci from the same Rails project:

# config/ci.rb - run with bin/ci

CI.run do
  step "Setup", "bin/setup --skip-server"
  step "Style: Ruby", "bin/rubocop"
  step "Security: Gem audit", "bin/bundler-audit"
  step "Security: Importmap vulnerability audit", "bin/importmap audit"
  step "Security: Brakeman code analysis", "bin/brakeman --quiet --no-pager --exit-on-warn --exit-on-error"
end

Five steps, each giving the agent a different kind of feedback. RuboCop catches style violations. Bundler-audit catches vulnerable gems. Brakeman catches security issues in the code itself. An agent that runs bin/ci gets five signals instead of one.

Browser access as a feedback loop

For web applications, there’s a feedback loop that most teams overlook: giving agents the ability to see what they built.

An agent that can only run tests is working blind on anything visual. It can verify that a controller returns 200, but it can’t tell whether the page actually renders correctly, whether a modal opens, or whether a form submits without errors. Cursor’s team wrote about this: once they gave agents browser access via cloud sandboxes, agents could “iterate until they’ve validated their output rather than handing off the first attempt.” More than 30% of their merged PRs are now created by agents operating autonomously in cloud sandboxes.

You don’t need a full cloud sandbox to get value from this. Claude Code has built-in Chrome support via claude --chrome, and tools like Playwright MCP give agents browser control locally. The agent can navigate to a page, take a snapshot of the DOM, fill in a form, and verify the result. That’s a feedback loop that catches an entire class of issues that unit tests and linters never will.

The smallest change that moves the needle

Add a linter to your CI pipeline. For a Ruby project, that’s RuboCop. For JavaScript/TypeScript, ESLint. For Python, Ruff. One config file, one CI step. The agent immediately starts getting feedback on style and conventions that it wouldn’t otherwise know about.

If you want faster feedback, add pre-commit hooks. The agent runs into the linter before it even pushes, which means it fixes issues in the same context window where it created them. That’s cheaper, faster, and produces cleaner commits.

For web projects, consider adding browser access through Playwright MCP or a similar tool. The agent starts verifying its own UI changes instead of relying on you to catch visual issues in review.

Where to Start

If you’re looking at your codebase and wondering where to start, here’s how I think about prioritization:

Fix your test foundation first. Without reliable tests, every other improvement is hard to verify. An agent can’t confidently refactor your architecture if there’s no test suite to catch regressions.
Add an AGENTS.md. This is 30 minutes of work that immediately changes agent behavior. It’s the highest-ROI improvement you can make.
Add a linter to CI. This closes the feedback gap with minimal effort. The agent starts learning your conventions from automated feedback instead of guessing from code patterns.

These three changes don’t require a major initiative. They’re individual tasks that compound. A codebase with strong tests, clear documentation, and fast feedback loops creates a reinforcing cycle: agents produce better code, which maintains the patterns, which makes future agent output even better.

If you want to see where your codebase stands across all eight dimensions, run the Codebase Readiness Assessment. It takes 60 seconds and gives you a score, a per-dimension breakdown, and a prioritized roadmap.

If your team wants hands-on help closing these gaps, that’s what a Production AI Retainer is built for. Or if you just want to talk through your results, book a free intro call.

How AI Agents Remember Things

Damian Galarza — Tue, 17 Feb 2026 00:00:00 -0500

Out of the box, AI agents have no memory. Every conversation starts with a blank slate.

Most people assume you need vector databases, complex retrieval pipelines, or specialized memory infrastructure to fix this. But it turns out the storage is the easy part. The hard part is knowing when to write and when to load. Get that right, and the rest is just files.

Prefer video? Watch How AI Agents Remember Things on YouTube →

I’ll use OpenClaw as a case study here. Its memory model is one of the clearest real-world implementations I’ve seen. But the patterns apply to any agent you build.

Why Agents Have No Memory By Default

AI models are inherently stateless. There’s no memory between calls. What looks like a conversation is just an increasingly long context window being passed on each turn. Every message, every response, every tool call gets appended to the transcript and sent with the next request.

This works fine for a one-off question. It breaks down the moment you want an agent that knows you.

Memory systems handle this by splitting the problem in two: the session, and longer-term memory.

Sessions

A session is the history of a single conversation with an LLM. While the conversation is active, that history gets passed along with each call, and the model can see everything said so far. But LLMs have finite context windows, and as you approach that limit, something has to give.

That something is compaction. Compaction takes the session’s conversation history and condenses it down to the most important information so the conversation can continue. There are three different strategies for triggering it:

Count-based: compact once the conversation exceeds a certain token size or turn count
Time-based: triggered when the user stops interacting for a period of time, handled in the background
Event-based: an agent detects that a task or topic has concluded and triggers compaction. The most intelligent approach, but also the hardest to implement accurately

The shared problem with all three: you can’t simply carry entire old conversations forward into a new session. Context windows don’t allow it. That’s where long-term memory comes in.

Think of it as a desk and a filing cabinet. The session is the messy desk, with notes scattered around and documents open. Memory is the filing cabinet where things are categorized and stored for later. When the session ends, whatever isn’t filed is gone.

The Memory Taxonomy

Google published a whitepaper in November 2025 titled “Context Engineering: Sessions & Memory” that provides a useful framework for thinking about this. It breaks agent memory into three types.

Episodic memory covers events and interactions. “What happened in our last conversation?” If you spent a session debugging a webhook integration, episodic memory is what lets the agent recall that context in your next conversation.

Semantic memory is facts and preferences. “What do I know about this user?” Tech stack, coding style, project conventions. These are stable facts that don’t change much from session to session.

Procedural memory is workflows and learned routines. “How do I accomplish this task?” The agent’s understanding of your deployment process, your testing patterns, your PR review checklist.

All three work together to form what we’d call an agent’s memory. The challenge isn’t categorizing them. It’s extracting them from conversation and keeping them accurate over time.

Extraction and Consolidation

In order for a memory system to be effective, it needs to extract the right things from a conversation. Not every detail is worth keeping. Targeted filtering is necessary, the same way human memory doesn’t retain every word of a conversation. It retains key facts and decisions.

Beyond that, the system needs to consolidate. Consider a user who tells an agent “I prefer dark mode” in one session, then later says “I like dark mode,” and in another session mentions “I switched to dark mode.” Without consolidation, all three entries sit in memory saying essentially the same thing. A good memory system collapses those into a single entry: “User prefers dark mode.”

It also needs to handle updates. Something true today might not be true tomorrow. If you switch from dark mode to light mode, the memory system needs to overwrite the old entry, not append a contradictory one. Without this, memory becomes noisy and unreliable over time.

Both extraction and consolidation are typically handled by a separate LLM instance that takes a conversation and processes it, deciding what to keep, what to merge, and what to update.

Memory Storage

Storage itself is relatively straightforward. For local agents, markdown files work well. They’re readable, debuggable, and require no infrastructure. For agents that need semantic search across a large history, a vector database is the right tool. The choice depends on the use case.

What matters more than the storage format is the shape of what you store: semantic memory for stable facts, episodic memory for events and recent context, and procedural memory for workflows.

OpenClaw’s Memory Model

Let me walk through how one system actually implements this.

OpenClaw’s memory system has three core components, and all of them are just markdown files.

MEMORY.md is the semantic memory store. Stable facts, user preferences, identity information. It has a recommended 200-line cap and is organized into structured sections. The key design decision: this file is loaded into every single prompt, not retrieved on demand. The agent starts every conversation already knowing who you are.

Daily logs are OpenClaw’s first implementation of episodic memory. They live at ~/.openclaw/workspace/memory/YYYY-MM-DD.md and contain recent context organized by day. They’re append-only; new entries get added, nothing is removed. Today’s and yesterday’s logs are loaded at the start of each session.

Session snapshots are the second implementation of episodic memory. When you start a new session with /new or /reset, a hook captures the last 15 meaningful messages from your conversation, filtering out tool calls, system messages, and slash commands. It’s not a summary; it’s the raw conversation text, saved as a markdown file with a descriptive name like ~/.openclaw/workspace/memory/2026-02-08-api-design.md.

So at its core, OpenClaw’s memory is markdown files. But the files are only half the story. Without something that reads and writes them at the right time, they’re just sitting there doing nothing.

The files are the filing cabinet. What comes next are the four mechanisms that move things from the desk to the cabinet at the right moments.

How It All Comes Together

Mechanism 1: Bootstrap loading at session start.

For every new conversation, MEMORY.md is automatically injected into the prompt. The agent always has it. On top of that, the agent’s instructions tell it to read today’s and yesterday’s daily logs for recent context. MEMORY.md is injected by the system; the daily logs are loaded by the agent itself, following its own instructions.

This is the simplest pattern and the most important one. The agent doesn’t have to search for context. It’s just there.

Mechanism 2: Pre-compaction flush.

OpenClaw takes a count-based approach to compaction. When a session nears the context window limit, OpenClaw injects a silent agentic turn (invisible to the user) with the following instructions:

“Pre-compaction memory flush. Store durable memories now (use memory/YYYY-MM-DD.md; create memory/ if needed). If nothing to store, reply with NO_REPLY.”

When the agent sees this, it writes anything worth keeping to the daily log, then replies with NO_REPLY so it never surfaces in the conversation.

This turns a destructive operation into a checkpoint. Losing context becomes a save point rather than a loss. It’s the write-ahead log pattern: save before you lose, load when you start. The same pattern databases have used for decades, applied to agent memory.

Mechanism 3: Session snapshot on /new.

When you explicitly start a new session, a hook grabs the last chunk of your conversation, filters to meaningful messages only, and saves it with a descriptive filename. It only fires on explicit /new or /reset; closing the browser doesn’t trigger it. It’s an intentional save point, not an automatic backup.

Mechanism 4: User says “remember this.”

The simplest mechanism. If you ask the agent to remember something, it determines whether it belongs in MEMORY.md as semantic memory or the daily log as episodic memory, and writes accordingly. No special hook needed, just file-writing capabilities and instructions for how to categorize.

Why This Matters Beyond OpenClaw

Claude Code recently shipped a native memory feature. It also uses markdown files. The pattern is becoming standard.

The agents that feel most useful, the ones that stick as part of your workflow, are the ones that remember you. An agent that asks your tech stack every session doesn’t feel like a colleague. An agent that already knows your conventions and what you worked on yesterday does.

The building blocks are the same regardless of what you’re building on: file-first storage, lifecycle triggers tied to meaningful session events, and extraction and consolidation to keep memory clean over time.

Wrapping Up

OpenClaw’s entire memory system comes down to markdown files and knowing when to write to them. Semantic memory in MEMORY.md. Episodic memory in daily logs and session snapshots. And four mechanisms that fire at the right moments in a conversation’s lifecycle.

You don’t need a complex setup to give an agent memory. You need a clear answer to three questions: what’s worth remembering, where does it go, and when does it get written.

Building a Linear-Driven Agent Loop with Claude Code

Damian Galarza — Fri, 13 Feb 2026 00:00:00 -0500

In December, the developer community on X was buzzing about Ralph Wiggum. If you missed it, Anthropic’s Claude Code plugins had a plugin called Ralph Wiggum. In the README it’s described as:

Ralph is a development methodology based on continuous AI agent loops. As Geoffrey Huntley describes it: “Ralph is a Bash loop” - a simple while true that repeatedly feeds an AI agent a prompt file, allowing it to iteratively improve its work until completion.

This was used in a variety of ways. Two common ones were:

Unleash an agent to work on a single task on its own until it was done.
Unleash an agent to iterate through a backlog of work until it had completed all of it.

Today we’re going to explore the second one, using an agent loop to iterate through a project backlog.

Where Ralph Wiggum Falls Flat

The Ralph Wiggum plugin provides a command you call inside Claude Code. The session continues until a set of requirements have been met, at which point the loop exits. For example:

/ralph-loop "Build a REST API for todos. Requirements: CRUD operations, input validation, bin/rails test and bin/rails lint must pass. Output COMPLETE when done."

There is a drawback to this approach though. Running the loop inside of a Claude Code session means we’re eating away at our context window. If you’ve read my blog post on Understanding Claude Code’s Context Window then you know that this can cause poor results as time goes on. This becomes exponentially worse if you are trying to loop through multiple pieces of work. The agent’s context window will be subject to context rot as different streams of work are worked on.

There is a solution though.

Bash Loops

Instead of running a Ralph Wiggum loop inside of the Claude Code instance, we can loop inside bash. In this version every iteration of the loop starts with a fresh context window, avoiding issues with context rot. This works via the --dangerously-skip-permissions flag, which allows Claude Code to run non-interactively without prompting for tool approvals. An example loop looks something like:

while true; do
  SESSION=$((SESSION + 1))
  TIMESTAMP=$(date +%Y%m%d_%H%M%S)
  COMMIT=$(git rev-parse --short=6 HEAD 2>/dev/null || echo "no-git")
  LOGFILE="${LOG_DIR}/${AGENT_NAME}_${TIMESTAMP}_${COMMIT}.log"

  echo "--- Session #${SESSION} starting at $(date) ---"
  echo "    Log: $LOGFILE"

  claude --dangerously-skip-permissions \
    -p "$(cat "$PROMPT_FILE")" \
    --model "$MODEL" \
    &>"$LOGFILE" || true

  echo "    Session #${SESSION} ended at $(date)"
  echo ""

  # Brief pause between sessions to avoid hammering if something is broken
  sleep 5
done

The $PROMPT_FILE is where the real work gets defined. It’s a markdown file that tells the agent exactly what to do during each session. Mine walks the agent through a full lifecycle: orient itself on the project, pick up the next issue from Linear, build the feature, run a code review with subagents, and open a pull request. It also includes guardrails like one issue per session, never break main, and what to do if blocked or stuck for more than 15 minutes.

Let’s walk through how each of these pieces works in practice.

How It All Fits Together

I decided to give this a try on my recent project CreatorSignal that I’ve been building during my live streams. While I’ve seen many people maintaining their backlogs in markdown files or custom Kanban board experiences within Claude Code, I prefer using Linear. I didn’t want to recreate a task management system just for the agent loop. With the Linear MCP in hand, here’s how I set it up.

PROGRESS.md

One of the core pieces is the PROGRESS.md file. While the individual tasks are tracked and maintained in Linear, this file is meant to serve as a sort of “memory” for the agents to understand what has been accomplished from a more holistic level. At the start of each loop, the PROGRESS.md file is read in. At the end of a loop, the agent writes to it what it has accomplished.

Example:

# Progress

## 2026-02-13

### PRX-27: Billing portal (Stripe Customer Portal integration) — DONE
- Created `BillingPortalController` with `show` and `create` actions
- Billing page displays current plan, price, next billing date
- "Manage Subscription" button creates Stripe BillingPortal::Session and redirects
- Free users see upgrade CTA; former subscribers can still access portal for invoices
- Cancellation pending state shown with reactivation option
- 11 request specs + 6 system specs, all passing (266 total)
- PR: https://github.com/dgalarza/CreatorSignal/pull/31
- Branch based on PRX-25 (chain: PRX-17 → PRX-23 → PRX-24 → PRX-25 → PRX-27)

Implementing an Issue

Using the Linear MCP, the agent finds the next highest priority issue to work on. It starts by looking at the “Todo” column and picks the next one up. If there’s nothing in Todo, it checks the backlog instead. From there it reads the issue’s details to understand the work that needs to be done. For the loop to work well, issues need to be spec’d out thoroughly. This gives the agent the highest chance of performing quality work without human supervision.

With an issue selected, the agent moves it to “In Progress”, creates a branch, and starts building. A task is not considered “done” unless the test suite and linters both pass. This is another critical piece for a successful agent loop. The agent must have solid ways of verifying its own work. Without automated checks, it’s difficult for the agent to understand success, and quality drops.

When the agent believes its work is ready, it comments on the Linear issue with a summary of what it built and moves the issue to “In Review”.

Code Review

Similar to my workflow described in How I Use Claude Code, the next step is to spawn subagents to perform code review. The agent uses the Task tool to spin up a reviewer that evaluates the diff against the issue requirements, checking for correctness, test quality, Rails conventions, security, and performance.

The review is posted as a comment on the Linear issue. This provides visibility into the full lifecycle of the work. I can see the main agent’s implementation summary alongside the code review feedback. The agent then resolves any feedback it received and posts a final comment on the Linear issue summarizing its decisions.

Pull Request

After the code review process is complete and feedback is addressed, the agent commits the work and opens a pull request. The Linear issue is moved to “Done”, and the agent writes its progress update to the PROGRESS.md file.

Clean Up

With everything complete, the agent’s last instructions are to check out the main branch and rebase against origin/main so that the next loop starts in a fresh state. The loop then exits cleanly. There’s a built-in pause after each iteration before the next one starts.

Visibility

This loop proved to work well. I connected Slack to my Linear project so I could see notifications coming in as the agent worked through issues. Each time an issue had its status updated, each time an agent completed its work, and each time an agent received and addressed review feedback, I could see the progress in real time.

Improving on the Workflow

While this initial pass at a loop was working well, I had some things I wanted to improve. First, as pull requests were getting opened and merged, some would end up becoming stale with merge conflicts given the speed at which new features were landing. Second, I wanted to be able to leave feedback on a pull request as if I was working with a team member and have it get addressed by the agent as part of the loop.

I solved this by adding a new step to the loop as follows.

Before picking up a new task, the agent runs bin/pr_check. This script looks through my open pull requests for any with the “needs-revision” label. If none need review feedback addressed, it checks for any that have gone stale with merge conflicts.

If a PR like this is found, the loop addresses one PR leaving the next for the next loop iteration. So whenever I had a PR that I felt had feedback I wanted addressed, I would leave comments on it and add the “needs-revision” label. The next time a loop happens the agent will pick it up and address the feedback.

# bin/pr_check
#
# Finds the first open PR that needs attention.
# Returns JSON with PR details if one needs work, or empty output if all clean.
#
# A PR "needs attention" if:
#   1. It has merge conflicts (mergeableStatus == CONFLICTING)
#   2. It has the "needs-revision" label
#
# Usage:
#   bin/pr_check           # returns JSON or empty
#   bin/pr_check --quiet   # exit code only (0 = needs attention, 1 = all clean)
#
# Output format:
#   {
#     "number": 42,
#     "branch": "damian/prx-7-exa-research-tools",
#     "title": "PRX-7: Exa research tools",
#     "url": "https://github.com/...",
#     "reason": "has_feedback",    # or "conflicting" or "conflicting,has_feedback"
#     "conflicting": true,
#     "has_feedback": true
#   }

The loop itself is about a 100 line bash script. I’ll be adding it to my Claude Code workflows this week and sharing it with my newsletter.

What Makes This Work

After running this loop across several sessions, a few things stand out as critical to getting quality results:

Fresh context per iteration. Running the loop in bash instead of inside a Claude Code session means each task gets a clean context window. This is the single biggest difference from the Ralph Wiggum approach.
Well-spec’d issues. The agent is only as good as the instructions it receives. Vague issues produce vague results. Detailed acceptance criteria and clear scope make all the difference.
Automated verification. Requiring passing tests and linters before a task is considered “done” gives the agent a concrete definition of success. Without this, quality drops fast.
Linear as the source of truth. Using an existing project management tool instead of reinventing one means I can see the full lifecycle of every issue, from backlog to done, with comments and status updates along the way.

The combination of these pieces turns what could be a chaotic autonomous loop into something that produces reviewable, mergeable work. It’s not perfect, and I still review every pull request before merging, but the amount of ground it covers between review cycles is significant.

Additional Reading

MCPs vs Agent Skills: Understanding the Difference - The agent loop relies on the Linear MCP as its backbone. This post covers how MCPs and skills serve different roles in your workflow.
Understanding Claude Code’s Context Window - A deep dive into how the context window works and why fresh context per iteration is so important.
How AI Agents Remember Things - The PROGRESS.md pattern is essentially agent memory between loop iterations. This video covers how agents persist context across sessions.
MCPs vs Skills: The Mental Model You’re Missing - The video companion to the blog post above, covering the architectural distinction between MCPs and skills.
Building CreatorSignal - The livestream series where I’m building CreatorSignal, the project this agent loop runs against.

If you haven’t already, sign up for my newsletter for weekly emails on AI Engineering and agentic development workflows.

If you’re building agent loops or autonomous workflows and want help getting the architecture right, I work with teams on exactly this. Let’s talk.

MCPs vs Agent Skills: Understanding the Difference

Damian Galarza — Thu, 05 Feb 2026 00:00:00 -0500

“Should I build a skill or an MCP for this?”

I’ve been asked this question a lot since Anthropic announced Agent Skills back in October 2025. And honestly, the confusion makes sense. Both extend Claude Code’s capabilities. Both can connect to external services. Skills can even run scripts, which sounds a lot like what MCPs do.

But once you understand the mental model, the distinction becomes obvious. Let’s break it down.

What MCPs Actually Do

Model Context Protocol is an open standard for connecting AI applications to external systems. It’s the plumbing that connects Claude to the outside world by exposing tools that can read data, execute actions, and interact with external services.

For example, you can add the Linear MCP and give Claude the ability to read and create issues, or add the Sentry MCP so it can query errors. These are capabilities Claude didn’t have before. MCPs extend what Claude can do.

There’s something you need to consider when adding MCPs though: every MCP you add to Claude Code takes up space in your context window just by being available. Not just when it’s used, but constantly. If you’ve read my post on Understanding Claude Code’s Context Window, you know this matters a lot.

The Anatomy of an MCP Tool

Every MCP tool exposes information to the LLM so it knows when and how to use it. Here’s what Claude sees when the Linear MCP is configured:

│ get_issue (linear-server) [read-only]                                        │
│ Tool name: get_issue                                                         │
│ Full name: mcp__linear-server__get_issue                                     │
│                                                                              │
│ Description:                                                                 │
│ Retrieve detailed information about an issue by ID, including attachments   │
│ and git branch name                                                          │
│                                                                              │
│ Parameters:                                                                  │
│   • id (required): string - The issue ID                                     │
│   • includeRelations: boolean - Whether to include blocking, related,        │
│     and duplicate relations in the response                                  │

The description tells the LLM when and why to use the tool. Some descriptions are verbose, which means they consume more tokens on every single message. The parameter schema is typically JSON that defines the tool’s inputs. And the tool name is what the LLM calls to invoke it.

Here’s why this matters: in Build Efficient MCP Servers: Three Design Principles, I showed how a Claude Code session can have 24% or more of the context window consumed by MCP tool definitions before you’ve even started a conversation. Add a few feature-rich MCP servers and you’ve got precious little space left for actual work.

This used to create a hard practical limit. Too many MCPs and the model would get confused, more likely to pick wrong actions. Anthropic addressed this in January 2026 with MCP Tool Search, which dynamically loads MCP tools on-demand when they would consume more than 10% of context. This helps, but the underlying tension remains: MCP tool definitions compete for context space, which is why skills use a different approach entirely.

The Key Characteristics

MCPs are:

Single-purpose tools - Each tool does one specific thing
Autonomous - Claude can call them directly without any instruction from you
Always loaded - Tool descriptions are in context on every message (or dynamically loaded via MCP Tool Search)
Bidirectional - Can read from and write to external systems

When you ask Claude “What’s the status of issue TRA-123?”, it can autonomously decide to call the Linear MCP to fetch that information. No skill needed, no special invocation. The capability is just there.

What Agent Skills Actually Do

Since the original announcement of Agent Skills, Anthropic has released Agent Skills as an open standard, and other tools like GitHub Copilot and Cursor now support them as well.

At first glance, skills look simple. They’re essentially a folder with some markdown files and optionally some scripts:

my-skill/
├── SKILL.md           # Main instructions (required)
├── reference.md       # Detailed docs (loaded as needed)
├── examples.md        # Usage examples (loaded as needed)
└── scripts/
    └── helper.py      # Executable scripts (run, not loaded)

Skills typically live in .claude/skills/ within your project or ~/.claude/skills/ for global availability.

Skills can execute code. But that’s not what makes them special. What makes them special is orchestration. They compose multiple capabilities into a defined workflow.

A tool lets Claude query your database. A skill teaches Claude your company’s specific data model, your naming conventions, your rollback procedures. MCPs are verbs. Skills are playbooks.

The Four Flavors of Skills

In my experience, skills tend to fall into four categories:

Specialized workflows are multi-step procedures for specific domains. Things like a TDD workflow, a PR review process, or a deployment checklist. These are the skills I use most often.

Tool integrations are instructions for working with specific file formats or APIs. Maybe you need Claude to know how to process DOCX files, manipulate PDFs, or query BigQuery a specific way.

Domain expertise captures company-specific knowledge. Your data model, your naming conventions, your rollback procedures. The stuff that lives in tribal knowledge.

Knowledge retrieval bundles reference documentation that Claude can access on demand. API specs, style guides, architectural decision records. Rather than stuffing everything into CLAUDE.md, you package it into a skill that loads only when relevant.

Why Skills Exist: Progressive Disclosure

The key design principle behind skills is progressive disclosure. Unlike MCPs where tool definitions are always present, skills only load their full content when invoked.

The most basic skill is a folder with a SKILL.md file. This file contains YAML frontmatter with metadata (name and description) followed by the actual instructions. For any given skill, only the metadata is persistently available. The description tells the LLM when to invoke the skill, so you need to capture the right semantics for the agent to pick it up appropriately.

Once the skill is invoked, the LLM loads the rest of the SKILL.md file into context and follows its instructions. You can also break skills into separate resource files for different scenarios or workflows. This lets you keep context lean by loading only what’s needed for the current task.

What This Looks Like in Practice

In How I Use Claude Code: My Complete Development Workflow, I described my linear-implement skill that takes a Linear issue and implements a solution following TDD. Here’s how the pieces fit together:

┌──────────────────────────────────────────────────────────────┐
│                        SKILL                                 │
│                 (orchestration layer)                        │
│                                                              │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ Bundled: scripts/ │ references/ │ assets/               │ │
│  └─────────────────────────────────────────────────────────┘ │
│                                                              │
│    ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐       │
│    │  MCP    │  │  Bash   │  │  File   │  │  Web    │       │
│    │ (Linear)│  │ (tests) │  │ (write) │  │ (fetch) │       │
│    └─────────┘  └─────────┘  └─────────┘  └─────────┘       │
└──────────────────────────────────────────────────────────────┘

The skill sits on top and orchestrates everything. It calls the Linear MCP to fetch issue details. It runs bash commands to execute tests. It writes code files following TDD. It creates PRs via the GitHub CLI.

Without a skill, Claude can do all these things individually. But you have to orchestrate each step manually. Every session, you re-explain the workflow. With a skill, one command triggers the entire workflow. Consistent process every time. Your expertise encoded into Claude’s behavior.

The CLAUDE.md vs Skills Question

A common point of confusion: when should something go in CLAUDE.md versus a skill?

Here’s how I think about it:

CLAUDE.md is for declarative knowledge. What and why. Background context that Claude should just know. “This is Rails 7 with RSpec.” “We use JSON:API format.” “Run tests with bin/rspec.”

Skills are for procedural knowledge. How. Multi-step workflows with defined steps. “When implementing a feature, follow this TDD workflow…” “To deploy, run these 5 steps…”

The analogy that works for me: CLAUDE.md is like an employee handbook (background context). Skills are like training modules (specific procedures).

If you’re copy-pasting the same multi-step instructions into chat repeatedly, that’s a skill waiting to be created. If it’s background context Claude should just know, it belongs in CLAUDE.md.

There’s a practical difference too. CLAUDE.md is always loaded in context, so it should stay lean. Skills use progressive disclosure, so they can be extensive without penalty when not in use.

Putting It Together

Now that we’ve covered what each one does separately, let me show you what it looks like when they work together.

┌─────────────────────────────────────────────────────────────────┐
│  Prompt: "Help me implement Linear TRA-123"                     │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  SKILL activates (context match)                                │
│  → Loads bundled resources, defines workflow                    │
└─────────────────────────────────────────────────────────────────┘
                              │
                    ┌─────────┴─────────┐
                    ▼                   ▼
              ┌──────────┐       ┌──────────┐
              │   MCP    │       │  Native  │
              │ (Linear) │       │  Tools   │
              │          │       │          │
              │ Fetches  │       │ Bash,    │
              │ issue    │       │ File ops │
              │ details  │       │ for TDD  │
              └──────────┘       └──────────┘
                    │                   │
                    └─────────┬─────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  Result: Feature implemented following TDD workflow             │
└─────────────────────────────────────────────────────────────────┘

The MCP gives access to Linear (the capability). The skill orchestrates the workflow (the recipe). Each has its role.

The Two Questions

You might be thinking there’s overlap here. Since skills can have scripts, can’t they also connect to external services? Yes, they can. Skills can include scripts that hit APIs, run curl commands, whatever you need. The difference is that these scripts only run in the context of the skill itself. If you need something more general purpose that Claude can call from any context, you want an MCP instead.

When Claude needs to check Linear issues, it can do that anytime, in any context, without any special setup. That’s an MCP’s job. But when you say “implement this feature,” you want a specific sequence of steps followed in a specific order. That’s a skill’s job.

When I need to decide which to use, I ask myself two questions:

Question 1: Should Claude be able to call this capability anytime, across any context?

If yes, you need an MCP.

If only during a specific workflow, a skill with scripts is fine.

Checking Linear issues? That’s something Claude might need to do in many different contexts. MCP makes sense. Deploying to staging? That’s a specific workflow with defined steps. Skill makes sense.

Question 2: Is this a repeatable workflow with defined steps?

If yes, build a skill (with or without MCPs).

If no, you might not need either. Just ask Claude directly.

If you find yourself explaining the same multi-step process to Claude repeatedly, that’s your signal. That’s when you build a skill.

Common Patterns

Here’s how this plays out in practice:

MCP alone: “Check my Linear issues.” Claude decides to call it autonomously.

Skill using MCP: “Implement TRA-123.” The skill orchestrates the workflow, calling the Linear MCP as one step among many.

Skill with scripts: “Deploy to staging.” The workflow runs deploy scripts that hit external services.

Skill without external calls: “Follow our TDD process.” Pure internal workflow, no external systems needed.

The Mental Model

That’s the mental model. MCPs give Claude capabilities. Skills give Claude orchestration. Or to put it another way:

MCPs = The tools in the toolbox Skills = The recipes that coordinate those tools

MCPs are the plumbing connecting Claude to the outside world. Skills are the playbook of procedural knowledge.

MCPs answer “what can Claude access?” Skills answer “how should Claude approach this task?”

Getting Started

If you’re just getting started, here’s my recommendation:

Start with MCPs. Find one that connects to a tool you already use. Linear, Sentry, your database, whatever. Install it and start calling it. Get a feel for how Claude uses capabilities autonomously.

Watch for patterns. When you notice you’re asking Claude the same multi-step sequence over and over, that’s your signal. That’s when you build a skill.

Keep it simple. Your first skill doesn’t need to be complex. Start with a workflow you repeat weekly, document the steps, and let Claude follow them consistently.

If you want to build your own skill from scratch, check out my video Claude Code Tutorial: Build your first skill in 10 minutes where I walk through creating a TDD workflow skill step by step.

For more examples, sign up for my newsletter. You’ll get access to my claude-code-workflows repo on GitHub, which includes several skills I use daily, including the linear-implement workflow that ties everything together.

Understanding Claude Code's Context Window

Damian Galarza — Mon, 08 Dec 2025 00:00:00 -0500

I’ve been using Claude Code for some time now, and as I discussed in How I Use Claude Code: My Complete Development Workflow, using AI coding tools effectively is a skill in itself. One of the most important parts of getting value from your AI coding assistant is managing context.

In this post we’ll look at how you can make the most of your available context window in Claude Code, as well as some common pitfalls to avoid.

Understanding the Context Window

Before we can begin to try to optimize our developer workflow we need to get an understanding of what the context window is and how it gets filled. The context window is how much content a large language model can hold onto at one time. Each model has predefined limits to the size of its context window. For example, Claude Sonnet 4.5’s context window is about 200,000 tokens.

What is a Token?

When you send text to an LLM, it doesn’t process words one at a time. Instead, text is broken into tokens—the fundamental units that language models read and generate. A token typically represents 3-4 characters, or roughly 0.75 words in English.

For example, the phrase "Hello world" becomes 2-3 tokens, while a compound word like authentication_middleware might be split into 5-7 tokens despite being a single identifier. Code tends to be more token-dense than prose because of special characters, naming conventions, and syntax. This is why reading source files consumes context faster than you might expect.

Why Token Efficiency Matters

Context windows have limited space, and filling them with code happens fast. But running out of room isn’t the only concern. LLMs suffer from a “lost in the middle” problem. Content at the start and end of the context window gets prioritized, while information in the middle tends to get overlooked. This mirrors how human memory works (we remember beginnings and endings better than middles).

Additionally, our code isn’t the only thing consuming context window space. Our context window is going to be filled by:

MCP Servers

Every MCP server you add is going to take some amount of space in your context window just by being available and present. Every MCP tool definition comes with:

Tool name (e.g., mcp__ynab__get_transactions)
Description - an explanation as to what the tool does so that the LLM can understand when it might be needed.
Parameter Schema - JSON schema definition of all the parameters, types, descriptions and constraints.
Usage notes - additional instructions and potentially examples to guide the LLM during its tool choice.

Let’s take a look at an example from the YNAB MCP I built and discussed in Build Efficient MCP Servers: Three Design Principles.

{
    "name": "mcp__ynab__get_transactions",
    "description": "Get transactions from YNAB budget.\n\n    Retrieves transactions with optional filtering by date
   range, account, or category.\n    Returns transaction details including date, amount, payee, category, and
  memo.\n\n    Use this tool when you need to:\n    - View recent transactions\n    - Find transactions in a
  specific date range\n    - Filter transactions by account or category\n    - Check transaction details for
  reconciliation\n\n    Args:\n        budget_id: Budget ID or 'last-used' for default budget\n        since_date:
  Optional start date (YYYY-MM-DD format)\n        until_date: Optional end date (YYYY-MM-DD format)\n
  account_id: Optional account ID to filter by specific account\n        category_id: Optional category ID to filter
   by category\n        type: Optional transaction type ('uncategorized', 'unapproved')\n\n    Returns:\n
  JSON array of transactions with:\n        - id: Transaction ID\n        - date: Transaction date\n        -
  amount: Amount in milliunits (divide by 1000 for dollars)\n        - memo: Transaction memo\n        - cleared:
  Cleared status\n        - approved: Approval status\n        - payee_id: Payee ID\n        - payee_name: Payee
  name\n        - category_id: Category ID\n        - category_name: Category name\n        - account_id: Account
  ID\n        - account_name: Account name\n\n    Example usage:\n        Get all transactions from November 2024:\n
          since_date='2024-11-01', until_date='2024-11-30'\n\n        Get recent uncategorized transactions:\n
    type='uncategorized'\n\n    Note: Amounts are returned in milliunits. Divide by 1000 to get dollar amounts.\n
   ",
    "parameters": {
      "type": "object",
      "properties": {
        "budget_id": {
          "type": "string",
          "description": "Budget ID or 'last-used' for default budget"
        },
        "since_date": {
          "type": "string",
          "description": "Optional start date (YYYY-MM-DD format)",
          "format": "date"
        },
        "until_date": {
          "type": "string",
          "description": "Optional end date (YYYY-MM-DD format)",
          "format": "date"
        },
        "account_id": {
          "type": "string",
          "description": "Optional account ID to filter by"
        },
        "category_id": {
          "type": "string",
          "description": "Optional category ID to filter by"
        },
        "type": {
          "type": "string",
          "enum": ["uncategorized", "unapproved"],
          "description": "Optional transaction type filter"
        }
      },
      "required": ["budget_id"],
      "title": "GetTransactionsArguments"
    }
  }

Token Breakdown

Component	Tokens
Tool name	8
Description (entire string)	430
Parameters schema	225
TOTAL	~663 tokens

This one tool definition takes up about 663 tokens. Not terrible on its own, but my YNAB MCP has about 15 tools. As you add more and more MCP servers to your stack you are consuming more and more of your context window from tool definitions alone. So it’s important to be careful not to overload your coding assistant with too many MCP servers.

The community has been exploring new ways to make MCP servers more context efficient. One approach Anthropic has written about is allowing code execution within MCP servers. You can learn more about this in Code execution with MCP: Building more efficient agents but the short version is instead of having an MCP server expose lots of different tools, it exposes a single tool or handful of tools which can then execute their own code in a sandboxed environment to achieve results on its own. Anthropic also recently announced a beta feature for advanced tool use in Claude. One of the stand out updates here is moving away from a static tool list to being able lazily load tool definitions via a tool search tool.

Both of these are in their early stages so we’ll continue to need to be careful about how many MCP servers we add to our coding agents and how much of the context window they consume. With that out of the way let’s take a look at a real world scenario of a context window in a development environment and how we can make the best out of it.

A View Into Your Context Window

Claude Code provides us with a command we can run within a session called /context. This command will report back the current state of your context window including how much space everything is taking up. Let’s take a look at the output of /context within Tracewell AI:

Context Usage
⛁ ⛀ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁   claude-sonnet-4-5-20250929 · 101k/200k tokens (51%)
⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁
⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛀ ⛶   ⛁ System prompt: 3.1k tokens (1.6%)
⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ System tools: 19.8k tokens (9.9%)
⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ MCP tools: 26.5k tokens (13.3%)
⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ Custom agents: 2.8k tokens (1.4%)
⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ Memory files: 4.0k tokens (2.0%)
⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛝ ⛝ ⛝   ⛁ Messages: 8 tokens (0.0%)
⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝   ⛶ Free space: 99k (49.4%)
⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝   ⛝ Autocompact buffer: 45.0k tokens (22.5%)

MCP tools · /mcp
└ mcp__memory__create_entities (memory): 686 tokens
└ mcp__memory__create_relations (memory): 689 tokens
└ mcp__memory__add_observations (memory): 668 tokens
└ mcp__memory__delete_entities (memory): 612 tokens
└ mcp__memory__delete_observations (memory): 666 tokens
└ mcp__memory__delete_relations (memory): 690 tokens
└ mcp__memory__read_graph (memory): 568 tokens
└ mcp__memory__search_nodes (memory): 607 tokens
└ mcp__memory__open_nodes (memory): 609 tokens
└ mcp__sentry__whoami (sentry): 602 tokens
└ mcp__sentry__find_organizations (sentry): 735 tokens
└ mcp__sentry__find_teams (sentry): 1.0k tokens
└ mcp__sentry__find_projects (sentry): 999 tokens
└ mcp__sentry__find_releases (sentry): 1.2k tokens
└ mcp__sentry__get_issue_details (sentry): 1.4k tokens
└ mcp__sentry__get_trace_details (sentry): 1.3k tokens
└ mcp__sentry__get_event_attachment (sentry): 1.3k tokens
└ mcp__sentry__update_issue (sentry): 1.5k tokens
└ mcp__sentry__search_events (sentry): 1.5k tokens
└ mcp__sentry__find_dsns (sentry): 1.0k tokens
└ mcp__sentry__analyze_issue_with_seer (sentry): 1.3k tokens
└ mcp__sentry__search_docs (sentry): 1.8k tokens
└ mcp__sentry__get_doc (sentry): 768 tokens
└ mcp__sentry__search_issues (sentry): 1.5k tokens
└ mcp__sentry__use_sentry (sentry): 968 tokens
└ mcp__context7__resolve-library-id (context7): 887 tokens
└ mcp__context7__get-library-docs (context7): 957 tokens

Custom agents · /agents
└ rails-backend-expert (Project): 444 tokens
└ cybersecurity-expert (Project): 287 tokens
└ prompt-engineer (Project): 609 tokens
└ tailwind-viewcomponent-expert (Project): 417 tokens
└ product-strategy-advisor (Project): 608 tokens
└ regulatory-510k-consultant (Project): 459 tokens

Memory files · /memory
└ User (/home/dgalarza/.claude/CLAUDE.md): 10 tokens
└ Project (/home/dgalarza/Code/tracewell.ai/CLAUDE.md): 4.0k tokens

SlashCommand Tool · 16 commands
└ Total: 2.7k tokens

As you can see this gives us a nice detailed view of our context window, including what percentage of the context window is currently available and a breakdown of what is taking up space. From it we can see that my MCP tools are taking up 26.5k tokens which is about 13.3% of the Claude Sonnet 4.5 context window. Aside from that, we can see that the Custom agents that are defined and available are taking up about 2.8k tokens, my project’s CLAUDE.md is 4k tokens and about 22% of the context window is reserved for autocompacting.

What is autocompacting?

In order to understand autocompacting we first need to understand how a typical conversation flows within the Anthropic API. By default, every call to the Anthropic Claude API has no recollection of previous parts of a conversation. Instead as the consumer of the API we need maintain that conversation history and provide it to the API. Take a look at the following diagram:

You can see that the first request kicks off the conversation with “Add error handling to the auth module”. From there we get a response back from the LLM with the result of what it did. When the user then continues the conversation in request 2, we can see they say “Now add tests for those changes”. You can see here though that we actually end up sending the full conversation history, with our first message, along with the response from the LLM and now our new message. This is a simplified example which doesn’t include tool calling. Any tool call requests would also be in this history as well as the results from tool calling. As your conversation gets longer and longer, more and more of the context window is being taken up. As you approach the limit of the context window space must be freed up. One way to do this is compaction.

Compacting the context window is a context engineering technique to compress a long running conversation or session by summarizing the conversation in order to free up space. This summarization is typically handled by an LLM. This generated summary then becomes the basis of the remainder of the conversation. Compacting the conversation history can sometimes work well. However it’s not an exact science and you are beholden to the LLM to identify the correct things to include in the summary. If you’ve ever had a long running conversation with Claude Code and started feeling like things have started to go off the rails, you might have experienced this. In the long conversation you might end up with multiple autocompact calls where now the LLM is summarizing a summarization along with the rest of the conversation.

Warning signs of problematic autocompact:

Claude forgets decisions you made earlier in the conversation
Claude repeats work it already completed
Claude asks questions you already answered
Solutions start contradicting earlier approaches

When you notice these symptoms, it’s usually time for a /clear and a fresh start rather than continuing to fight against a degraded context.

Managing Your Context Window

Now that we understand what the context window is, how a conversation’s history occurs and how it impacts the context window let’s explore different ways to manage the context window to make the most out of it.

Delegating to Subagents

Claude Code has the ability to spin off “subagents” when it’s working. These subagents each have their own context window which is separate from the main conversation. This gives us two advantages. First, the subagent’s context window isn’t cluttered with our previous conversation history. Second, and this is the flip side, our main conversation isn’t cluttered with all the details of whatever the subagent was instructed to work on. Instead, it reports back its results. You can see this in action with Claude Opus 4.5 whenever you plan something. It typically delegates its tasks out to subagents to help aid with the plan.

⏺ 3 Explore agents finished (ctrl+o to expand)
   ├─ Explore Tracewell Agent workflow · 23 tool uses · 104.5k tokens
   │  ⎿ Done
   ├─ Explore Tracewell DHF extractions · 28 tool uses · 108.0k tokens
   │  ⎿ Done
   └─ Explore eval framework · 24 tool uses · 101.8k tokens
      ⎿ Done

You can also instruct Claude to invoke a subagent explicitly. Some examples:

“Have a subagent do a code review of this branch against main”
“Use a subagent to explore how authentication works in this codebase”
“Spawn a subagent to research different caching strategies for this use case”

When the subagent completes, you’ll see a summary like this in your main conversation:

⏺ Task agent finished · 15 tool uses · 52.3k tokens
  ⎿ The code review found 3 issues: [summary of findings...]

Notice that the subagent used 52k tokens of its own context, but your main conversation only receives the summary. This is the key benefit: the detailed work happens in isolation.

I’ve found subagents work best for self-contained tasks that require reading lots of files. Code reviews are a natural fit. The subagent can dig through diffs without polluting your main context. The same goes for codebase exploration when you’re trying to understand how an unfamiliar feature works across multiple modules. Research tasks also work well here; you can have a subagent investigate implementation options and report back before you commit to an approach.

Using Custom Agents

Custom agents take subagents to another level. They allow us to define a custom agent with a persona and expertise area which makes use of the same functionality of subagents where they have their own context window. Additionally we can also define what tools it has access to. This is useful if you are defining an agent that you know doesn’t need specific tools so their tool definitions don’t need to take space in the context window.

An agent is a markdown file which lives in either ~/.claude/agents or .claude/agents. You can provide it a name, a description, a model, and tools which it is allowed to use. This is all handled via YAML frontmatter. After the frontmatter you define the agent itself.

Let’s take a look at a practical example.

In Tracewell I have defined a few subagents that you can see in the earlier /context output. The rails-backend-expert doesn’t need access to the Linear MCP so I can choose not to give the agent access to it or any of its tools. This is handled by setting an allow list of what tools you want to give the model access to:

---
name: rails-backend-expert
description: Use this agent when working on Ruby on Rails backend code, including models, controllers, services, jobs, database migrations, API endpoints, background processing, or any server-side Ruby logic.
tools: Bash, Glob, Grep, Read, Edit, Write, NotebookEdit, WebFetch, TodoWrite, WebSearch, BashOutput, KillShell, AskUserQuestion, Skill, SlashCommand, mcp__memory__create_entities, mcp__memory__create_relations, mcp__memory__add_observations, mcp__memory__delete_entities, mcp__memory__delete_observations, mcp__memory__delete_relations, mcp__memory__read_graph, mcp__memory__search_nodes, mcp__memory__open_nodes, mcp__context7__resolve-library-id, mcp__context7__get-library-docs
model: sonnet
---

You are a Ruby on Rails backend expert. Your role is to help with...

The agent’s full persona and instructions follow after the frontmatter. I recommend using the /agents command to get started. From there Claude will walk you through creating your first agent. When doing so Claude will ask you if you want Claude to generate or manually configure the agent. I recommend going with its recommended approach which is to have Claude create the agent. You can provide a high level prompt and it will then generate the full agent description for you. As part of the wizard for creating the agent Claude will ask you what tools you want the agent to have access to.

Claude Skills

In October 2025 Anthropic announced Agent Skills a way of encapsulating domain expertise or workflows for agents to follow. Skills are organized in folders with a core SKILL.md file which has frontmatter that has required metadata such as the name and the description. The body of the SKILL.md contains the instruction set of the skill itself.

When Claude Code starts, it loads the name and description of every skill available into its context via the system prompt. This allows Claude to use progressive disclosure in determining when to use a skill without loading the entire thing into its context window. This can be a powerful tool that can even potentially replace some MCP servers. This is possible because skills can also contain scripts that Claude can run. Instead of having to always expose tools into the context window via an MCP server you can provide a skill which has scripts that it can run that only get added to the agent’s context window when it is useful.

A great example of this is the Playwright Skill for Claude Code by Bryan Lackey. Previously if you wanted to easily add Playwright to Claude Code for interfacing with your web application, you’d add the playwright-mcp. The playwright-mcp adds 22 tools which consume about 14.3k tokens, which is 7.2% of your context window just by being available.

The skill, by contrast, only adds about 200 tokens at startup for its name and description. The full SKILL.md (around 4-5k tokens) only loads when you actually invoke the skill. If you use Playwright in maybe one out of every five sessions, you’re saving roughly 10k tokens in the sessions where you don’t need it.

Using clear

Another tool at your disposal is to use the /clear command often. This command resets / empties the context window providing a fresh start. I highly recommend that you do this often especially when you have completed a distinct task and are moving onto a new one where the previous conversation history is no longer needed or useful.

Compacting the conversation manually

Along with autocompact you can also manually choose when to compact a conversation. You can do this by running /compact. It takes an optional argument which is instructions on how Claude should perform the compaction. You can guide Claude to make sure that it captures certain information while generating its summary and compacting the conversation. I recommend this when you have made significant progress on your work and are moving onto some related work. Perhaps Claude broke up the work into multiple phases and you just completed phase 1. You could:

Use /clear to reset the context window. However, if you didn’t persist the plan / TODO list somewhere you’ll start from scratch.
Continue until autocompact kicks in and you let the LLM do the heavy lifting of summarizing / compacting the conversation.

Instead, I’d recommend using the /compact command and instruct Claude to summarize the progress you’ve made so far and start with a “fresh” context window on the next phase of work. I say “fresh” since we aren’t fully clearing the context window but are compressing the previous conversation.

Being Strategic About File Reads

It’s easy to overlook how quickly file reads consume context. Every time Claude reads a file, that content gets added to the conversation history. Large files, broad grep results, or reading several files in sequence can eat through your available context faster than you’d expect.

A few things I’ve learned to do:

When I know roughly where something is, I’ll point Claude to specific line ranges rather than having it read entire files. For example: “Look at the authenticate method in app/services/auth_service.rb around lines 45-80” instead of just “check the auth service”
I try to use targeted grep patterns before asking Claude to read files. Narrowing down candidates first means fewer files loaded into context. For example, instead of “find where we handle webhook failures”, I might say “grep for webhook.*fail or handle.*webhook in app/services/ and show me the matches before reading any files.” This way Claude identifies the 2-3 relevant files first rather than speculatively reading 10 service files looking for the right one.
For orientation questions like “what does this module do?”, asking Claude to summarize rather than read the whole thing can save significant tokens

This becomes especially important in larger codebases where a single exploration session can involve dozens of file reads.

Optimizing Your CLAUDE.md

Your project’s CLAUDE.md file loads into every conversation, so it’s worth keeping it lean. Looking back at my /context output, my project’s CLAUDE.md takes up 4k tokens, which is 2% of my context window before I’ve even started working.

A few things to keep in mind:

Bullet points tend to be more token-efficient than prose
Put the most critical instructions at the beginning since Claude pays more attention to the start and end of content (that “lost in the middle” problem again)
Consider whether instructions belong at the project level or could live in your user-level ~/.claude/CLAUDE.md instead
Periodically audit for outdated instructions that no longer apply

It’s a balancing act. You want enough context for Claude to understand your project’s conventions, but not so much that you’re burning tokens on rarely-relevant details.

Best Practices for Context Window Management

Monitor regularly - Run /context at the start of each session to understand your baseline usage
Audit your MCP servers - Remove any MCP servers you haven’t used recently; each one consumes tokens just by existing
Prefer skills over MCP servers - When building new functionality, consider skills first for better context efficiency through progressive disclosure
Clear between tasks - Use /clear liberally when switching between unrelated work
Strategic compacting - Use /compact with custom instructions when transitioning between related phases of work
Delegate complex work - Use subagents for self-contained tasks to keep their context isolated from your main conversation

Running a team on Claude Code? Context window management gets harder with 5+ engineers making different choices about MCP servers, CLAUDE.md conventions, and workflow patterns. A Production AI Retainer standardizes this across your team.

Conclusion

Context management isn’t just about avoiding limits; it’s about keeping your conversations focused and effective. A cluttered context window leads to degraded responses, just like a cluttered desk makes it harder to find what you need.

The key takeaways: monitor your usage with /context, delegate to subagents for isolated work, and use /clear liberally between tasks. When possible, prefer skills over MCP servers for better context efficiency through progressive disclosure.

Start by running /context in your next Claude Code session to see where your tokens are going. You might be surprised by what you find.

Update: With Claude Opus 4.7’s 1M token context window, the fundamentals here still apply but the ceiling and controls have changed. See 7 Practical Tips for Maximizing Extended Context for workflow adjustments specific to the larger window.

If this post was the explanation, the cheat sheet is the reference. Two sides: token costs for common MCPs on one, the /clear / /compact / subagent decision tree on the other.

Get the Context Window Cheat Sheet →

How I Use Claude Code: My Complete Development Workflow

Damian Galarza — Tue, 25 Nov 2025 00:00:00 -0500

For the past 8 months I’ve been using Claude Code as my daily driver. It’s become a core part of my development workflow. Before this I tried Cursor for a few months and enjoyed it. However, I’ve been a terminal/Vim user for quite a while, so moving to an IDE was a significant change. I found myself drawn to Claude Code’s agentic workflow rather than autocomplete or chat panels.

During this time my workflow has evolved significantly. This is partly from learning how to get the most out of it, but also from the Anthropic team’s continuous improvements to the product. New features are coming out at a rapid pace.

There’s an ongoing debate in our industry. Some developers swear by AI assistants while others remain skeptical. You hear stories about companies claiming developers are no longer needed, alongside dismissals that AI-generated code is always garbage. I find myself somewhere in the middle.

I believe using these tools is a skill in and of itself. When people tell me “it takes longer to get the LLM to do it right” or “I can do it faster myself,” I understand where they’re coming from. When I first started programming, I was slow too. But I got faster with practice. The same applies to working with agentic development tools.

This post walks through my typical Claude Code workflow. While I focus on Claude Code specifically, these concepts apply to many agentic coding tools.

Context is King

One of the biggest complaints I hear from developers goes something like this: “I tried using an AI assistant but it just wouldn’t get it right. I spent so much time trying to get it to do what I wanted and eventually gave up.”

I’ve written about how MCP tools consume the context window before. But context matters in other ways too.

When I dig deeper into these frustrations, I typically ask how they prompted the LLM. The answer is usually a fairly vague prompt. In a smaller codebase this might work fine, but in an established codebase it often falls short. We need to give the LLM a well-structured problem.

One thing I’ve learned is that developers who have experience managing or delegating tasks tend to adapt quickly. They already understand how to break down problems into small pieces for someone else to work on. This is why I spend time breaking down problems into bite-sized chunks—a common practice in agile development.

For example, while building Tracewell AI I work with Linear for issue tracking. Even though I’m typically working alone, being disciplined about creating issues pays off. I often use Claude via the desktop app or terminal to scope out work, break down problems, and create Linear issues. This upfront work makes implementation much smoother.

Tools I Use

My Claude Code setup relies on a few key tools that work together to provide rich context. Each addresses a different aspect of development—project management, error tracking, version control, and memory—creating a network of information that Claude can draw from when planning and implementing features.

Linear MCP Server

The Linear MCP Server is a backbone of my workflow. It gives Claude direct access to project issues, enabling both the creation of backlog items and the delegation of implementation tasks.

Sentry MCP Server

I use Sentry for error tracking, so the Sentry MCP is a natural addition. It allows me to point Claude at an exception for triaging or fixing. While I have Sentry connected to Linear for automatic issue creation, the MCP integration adds another layer of context when investigating errors. If you want to see this workflow in action, check out my video on Debugging Production Issues with AI.

GitHub CLI

This one is critical. If you’ve read the Claude Code Best Practices you’ve likely seen the recommendation to install the GitHub CLI (gh). If you haven’t read that guide, I highly recommend starting there.

Claude Code excels at using the GitHub CLI for tasks like:

Opening pull requests
Investigating GitHub issues
Debugging GitHub Action runs
Reviewing PR feedback
Performing code review on others’ pull requests

Memory MCP Server

The Memory MCP Server provides Claude with persistent memory across conversations. In my workflow, I use it to store implementation plans so Claude can track progress and maintain context throughout a feature’s development. When Claude creates a plan for a Linear issue, it saves it to the memory graph. This becomes especially useful when work spans multiple sessions.

With these tools in place, let’s look at another core part of my workflow.

Obsidian Notes

I’ve been using Obsidian for notes for over a year, but it never occurred to me to connect it to Claude Code until I heard the Every podcast episode with Noah Brier: How to Use Claude Code as a Second Brain. This significantly changed how I provide context to my development work.

Why is this connection so important? When I’m working on a project, I’m taking notes. At a project kick-off I’m capturing potential solutions, key pieces of code, and product knowledge. These notes go into my vault under paths like 01-Projects/DHF Extraction/2025-11-01-Pairing Session.md. Meeting transcripts end up in the same project folder.

When it’s time to implement, I put Claude in plan mode and instruct it to “Review my notes in 01-Projects/DHF Extraction and help me implement X.” Claude can now gather all the context I’ve assembled to inform its implementation plan.

If you want to learn more about how I process meeting notes in Obsidian, check out my Process Meeting Transcript Skill on GitHub.

To make the most of this, use the /add-dir command to add your Obsidian vault path. This allows Claude Code to reference files in your vault without permission issues.

Putting It All Together

With MCP servers handling project management, error tracking, and memory, plus Obsidian providing my accumulated notes and research, I have all the pieces needed for a comprehensive workflow. All of these tools come together in a Claude Agent Skill that takes a Linear issue by ID and implements a solution. Let me break down this skill.

# Overview

This skill provides a comprehensive workflow for implementing Linear issues with professional software engineering practices. It automates the entire development lifecycle from issue analysis through PR creation, ensuring quality through test-driven development, parallel code reviews, and systematic validation.

## When to Use This Skill

Use this skill when:
- User provides a Linear issue ID (format: `TRA-9`, `DEV-123`, etc.)
- User requests implementation of a Linear issue
- User wants a structured TDD approach with code review
- User needs automated workflow from issue to PR

Examples:
- "Implement TRA-142"
- "Help me build the feature in DEV-89"
- "Work on Linear issue ABC-456"

This sets the stage for what the skill does and when to invoke it. Now let’s look at the core workflow.

# Core Workflow

The skill follows a 14-step process:

1. **Fetch Linear Issue** - Retrieve complete issue details via Linear MCP
2. **Gather Additional Context** - Search Obsidian, Sentry, and GitHub for related information
3. **Move to In Progress** - Update issue status to indicate active work
4. **Create Feature Branch** - Use Linear's suggested git branch naming
5. **Analyze & Plan** - Break down requirements and create implementation plan
6. **Save to Memory** - Store plan in memory graph for tracking
7. **Review Plan** - Present plan for user confirmation
8. **TDD Implementation** - Invoke `tdd-workflow` skill for test-driven development
9. **Parallel Code Reviews** - Invoke `parallel-code-review` skill for comprehensive analysis
10. **Address Feedback** - Invoke `code-review-implementer` skill to systematically fix issues
11. **Validation** - Ensure all tests and linters pass
12. **Logical Commits** - Create meaningful commit history
13. **Create PR** - Generate comprehensive pull request with Linear linking
14. **Final Verification** - Confirm CI/CD pipeline and Linear integration

There’s a lot happening here, but the goal is straightforward: build as much context as possible before implementation begins. The workflow pulls in details from the Linear issue, related Obsidian notes, Sentry exceptions if relevant, and any linked GitHub discussions. For example, a Linear issue might have been extracted from a previous pull request discussion as a follow-up task—pulling that context in gives Claude a much better starting point.

One thing worth highlighting: step 7 (Review Plan) is a key part of this workflow. After gathering context and creating a plan, Claude presents it and waits for my approval before writing any code. This human-in-the-loop checkpoint prevents runaway implementations and gives me a chance to course-correct before significant work begins.

You’ll notice a few other skills referenced in the workflow. These are also available in the claude-code-workflows repo:

tdd-workflow skill A skill that outlines a test-driven development workflow following an outside-in testing approach.

parallel-code-review This workflow spins off two Claude sub-agents to perform code review in parallel. One focuses on Rails and object-oriented best practices while the other performs security analysis.

code-review-implementer A skill that ranks code review feedback by priority and systematically addresses it. High priority feedback is always addressed. Medium and low priority items are presented for my decision before implementation.

Getting Started

If you want to try this workflow yourself, here’s how to get started:

Install the MCP servers - Set up Linear, Sentry (if you use it), and Memory MCP servers in your Claude Code configuration.
Copy the skills - Clone or copy the skills from my claude-code-workflows repo into your project’s .claude/skills/ directory. You’ll need linear-implement and its dependencies (tdd-workflow, parallel-code-review, code-review-implementer).
Customize for your stack - My skills are tailored to Rails projects with specific conventions (POODR principles, Result pattern, RSpec). If you’re using Django, Node, Go, or another stack, you’ll want to adapt the code review criteria and testing workflows to match your conventions.
Connect your notes - Use /add-dir to add your Obsidian vault (or wherever you keep project notes) so Claude can reference them.
Try it out - Once everything is set up, just type “Implement TRA-142” (substituting your issue ID) and the workflow kicks off automatically.

Claude Code auto-discovers skills in the .claude/skills/ directory, so there’s no additional configuration needed beyond placing the files.

Conclusion

Getting value from agentic development tools requires building the right habits. By investing time upfront in breaking down problems, maintaining good notes, and connecting your tools together, you can create workflows that dramatically improve your productivity.

The key insight is that context matters. The more relevant information you can surface for the LLM, the better its output will be. This is true whether you’re using Claude Code, Cursor, or any other AI-assisted development tool.

If you’re interested in the full skill, you can find it in my claude-code-workflows repo on GitHub.

Developer-Productivity on Damian Galarza | Software Engineering & AI Consulting

Claude Opus 4/7 + Claude Code: 7 Practical Tips for Maximizing Extended Context

1. Front-Load Context in Your First Turn

2. Switch Effort Levels Mid-Session

3. Compact at 60%, Not When You See a Warning

4. Steer Your Compaction

5. Use Subagents for Context Isolation

6. Use Rewind to Recover from Failed Approaches

7. Know When to Clear, Compact, or Continue

The Mental Model

Further Reading

Four Dimensions of Agent-Ready Codebase Design

Test Foundation

What a low score looks like

What a high score looks like

Don’t stop at unit tests

The smallest change that moves the needle

Documentation as Code

What a low score looks like

What a high score looks like

The smallest change that moves the needle

Architecture Clarity

What a low score looks like

What a high score looks like

The smallest change that moves the needle

Feedback Loops

What a low score looks like

What a high score looks like

Browser access as a feedback loop

The smallest change that moves the needle

Where to Start

Further Reading

How AI Agents Remember Things

Why Agents Have No Memory By Default

Sessions

The Memory Taxonomy

Extraction and Consolidation

Memory Storage

OpenClaw’s Memory Model

How It All Comes Together

Why This Matters Beyond OpenClaw

Wrapping Up

Further Reading

Building a Linear-Driven Agent Loop with Claude Code

Where Ralph Wiggum Falls Flat

Bash Loops

How It All Fits Together

PROGRESS.md

Implementing an Issue

Code Review

Pull Request

Clean Up

Visibility

Improving on the Workflow

What Makes This Work

Additional Reading

MCPs vs Agent Skills: Understanding the Difference

What MCPs Actually Do

The Anatomy of an MCP Tool

The Key Characteristics

What Agent Skills Actually Do

The Four Flavors of Skills

Why Skills Exist: Progressive Disclosure

What This Looks Like in Practice

The CLAUDE.md vs Skills Question

Putting It Together

The Two Questions

Question 1: Should Claude be able to call this capability anytime, across any context?

Question 2: Is this a repeatable workflow with defined steps?

Common Patterns

The Mental Model

Getting Started

Further Reading

Understanding Claude Code's Context Window

Understanding the Context Window

What is a Token?

Why Token Efficiency Matters

A View Into Your Context Window

What is autocompacting?

Managing Your Context Window