<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Python on Damian Galarza | Software Engineering &amp; AI Consulting</title><link>https://www.damiangalarza.com/tags/python/</link><description>Recent posts from Damian Galarza | Software Engineering &amp; AI Consulting</description><generator>Hugo</generator><language>en-us</language><managingEditor>Damian Galarza</managingEditor><atom:link href="https://www.damiangalarza.com/tags/python/feed.xml" rel="self" type="application/rss+xml"/><item><title>Build Efficient MCP Servers: Three Design Principles</title><link>https://www.damiangalarza.com/posts/2025-11-06-build-efficient-mcp-servers-three-design-principles/</link><pubDate>Thu, 06 Nov 2025 00:00:00 -0500</pubDate><author>Damian Galarza</author><guid>https://www.damiangalarza.com/posts/2025-11-06-build-efficient-mcp-servers-three-design-principles/</guid><description>Three design principles for context-efficient MCP servers: filter at source, pre-aggregate data, work creatively. Real reductions: 746k→262 tokens.</description><content:encoded><![CDATA[<p>Recently I had an idea. What would it be like to interact with my YNAB budget via Claude Code using natural language? I wanted to be able to ask questions like &ldquo;How much did I spend on groceries last month?&rdquo; or &ldquo;What categories am I overspending in?&rdquo; and get accurate answers without digging through the app.</p>
<p>I found some existing YNAB MCPs, but most were inactive with limited features. This seemed like a good opportunity to learn MCP design from scratch. What followed was a deep dive into context efficiency that changed how I think about building AI tools.</p>
<h2 id="understanding-model-context-protocols-mcps">Understanding Model Context Protocols (MCPs)</h2>
<p>The Model Context Protocol (MCP) is a standardized way to extend language models with external capabilities. Unlike traditional APIs where you write code to call endpoints, MCP servers allow AI models to discover and use tools autonomously. The protocol defines how models can:</p>
<ul>
<li><strong>Call tools</strong> - Execute functions that interact with external systems</li>
<li><strong>Read resources</strong> - Access files, databases, or other data sources</li>
<li><strong>Receive prompts</strong> - Get specialized instructions for specific tasks</li>
</ul>
<p>When you build an MCP server, you&rsquo;re essentially creating a set of capabilities that any MCP-compatible AI assistant (like Claude) can use. The model sees your tool descriptions, understands what they do, and calls them as needed to fulfill user requests.</p>
<h2 id="understanding-context-windows">Understanding Context Windows</h2>
<p>Before we dive into how to make our MCPs more efficient, it&rsquo;s important to understand what we&rsquo;re trying to optimize. When working with LLMs, the context window is the amount of content that the model can &ldquo;pay attention&rdquo; to at one time. Each model has a limit to the size of its context window. For example, Claude Sonnet 4.5&rsquo;s context window is about 200,000 tokens.</p>
<h3 id="what-is-a-token">What is a Token?</h3>
<p>When you send text to an LLM, it doesn&rsquo;t process words one at a time. Instead, text is broken into <strong>tokens</strong>—the fundamental units that language models read and generate. A token typically represents 3-4 characters, or roughly 0.75 words in English.</p>
<p>For API responses and JSON data (which is what MCPs work with), tokenization looks like this:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{<span style="color:#cba6f7">&#34;name&#34;</span>: <span style="color:#a6e3a1">&#34;Checking&#34;</span>}           <span style="color:#6c7086;font-style:italic">// ~7 tokens
</span></span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"></span><span style="color:#a6e3a1">&#34;transfer_payee_id&#34;</span>            <span style="color:#6c7086;font-style:italic">// ~5 tokens
</span></span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"></span>{<span style="color:#cba6f7">&#34;balance&#34;</span>: <span style="color:#fab387">125000</span>}            <span style="color:#6c7086;font-style:italic">// ~6 tokens
</span></span></span></code></pre></div><p>The tokenizer breaks JSON into chunks: brackets, keys, values, and punctuation all consume tokens. Every field name has a cost. Field names like <code>&quot;debt_escrow_amounts&quot;</code> and <code>&quot;direct_import_in_error&quot;</code> cost ~4-6 tokens each. When you return an API response with 18 fields per object, you&rsquo;re paying the token cost for every field name, every time—even when the model doesn&rsquo;t need them.</p>
<h3 id="why-token-efficiency-matters">Why Token Efficiency Matters</h3>
<p>Given the limited size of the context window, it&rsquo;s critical to consider how much you&rsquo;re placing inside it. Models work best when they have good context, but there&rsquo;s a balance to strike:</p>
<ul>
<li>
<p><strong>Context limits are hard boundaries</strong>: Claude Sonnet 4.5&rsquo;s 200k token limit sounds generous until you realize a naive MCP returning a year of transactions can consume 746,800 tokens—nearly 4x the entire context window. Your tool call would fail before the model could even process it.</p>
</li>
<li>
<p><strong>Real sessions are already crowded</strong>: In typical Claude Code sessions, MCP tool definitions, system prompts, and memory can consume 50-60% of the context window before you&rsquo;ve had a single conversation. Every inefficient tool response eats into precious space needed for reasoning and multi-turn conversations.</p>
</li>
<li>
<p><strong>Noise degrades performance</strong>: Providing superfluous data doesn&rsquo;t just waste tokens—it forces the model to parse irrelevant fields, increasing the chance of errors or confusion. A focused 262-token summary outperforms a noisy 4,890-token dump of raw data.</p>
</li>
</ul>
<p>The goal isn&rsquo;t to minimize tokens at all costs. It&rsquo;s to give the model exactly what it needs, nothing more, nothing less. Let&rsquo;s look at how this plays out in practice.</p>
<h3 id="the-naive-approach-direct-api-wrapping">The Naive Approach: Direct API Wrapping</h3>
<p>When I started building the YNAB MCP, I didn&rsquo;t yet appreciate how much token efficiency would matter. I started with what seemed obvious: create a thin wrapper around the existing YNAB Python SDK. Each MCP tool would correspond to one API endpoint, passing through the full response. This is a common pattern I&rsquo;ve seen in many MCP implementations.</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Naive approach: Just wrap the SDK</span>
</span></span><span style="display:flex;"><span><span style="color:#89b4fa;font-weight:bold">@mcp.tool</span>()
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">async</span> <span style="color:#cba6f7">def</span> <span style="color:#89b4fa">get_budget</span>(budget_id: <span style="color:#89dceb">str</span>) <span style="color:#89dceb;font-weight:bold">-&gt;</span> <span style="color:#89dceb">str</span>:
</span></span><span style="display:flex;"><span>    response <span style="color:#89dceb;font-weight:bold">=</span> ynab_client<span style="color:#89dceb;font-weight:bold">.</span>budgets<span style="color:#89dceb;font-weight:bold">.</span>get_budget(budget_id)
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">return</span> json<span style="color:#89dceb;font-weight:bold">.</span>dumps(response<span style="color:#89dceb;font-weight:bold">.</span>data<span style="color:#89dceb;font-weight:bold">.</span>budget)  <span style="color:#6c7086;font-style:italic"># Return everything</span>
</span></span></code></pre></div><p>This approach works, technically. The model gets access to the data. But there&rsquo;s a critical problem: <strong>API responses are designed for applications, not for AI context windows.</strong></p>
<p>Traditional applications can process, filter, and cache data efficiently, so APIs return comprehensive data structures optimized for completeness, not token efficiency. A single endpoint might return thousands of fields because the API designers don&rsquo;t know which specific fields your application needs.</p>
<p>AI models work differently. Every byte consumes precious context window space—space you could use for reasoning, conversation history, or additional tool calls. When you blindly pass through full API responses, you&rsquo;re asking the model to pay the &ldquo;context tax&rdquo; for data it might not even need. Worse, the model has to analyze and determine which parts of that data are actually relevant—a cognitive load that can lead to errors or missed information.</p>
<p>To put this in perspective: tool results compete with everything else for context space. In a real Claude Code session (visible via <code>/context</code>), I saw the context window at 118k/<a href="https://docs.anthropic.com/en/docs/about-claude/models">200k tokens</a> (59%)—before I&rsquo;d even started a conversation. MCP tool definitions alone consumed 47.9k tokens (24%), system tools used 17.3k tokens (9%), custom agents took 2.4k tokens, and memory files added another 2.3k tokens. That&rsquo;s 59% of the context window used just by the environment.</p>
<p>A naive MCP that returns 30k tokens for a budget overview would push that to 74% in a single tool call—leaving just 52k tokens for the actual conversation, reasoning, and additional tool calls. Every inefficient tool response eats into the space you need for multi-turn conversations.</p>
<p>This changed how I approached the design: MCPs need to be context-aware intermediaries, not transparent proxies. The question shifted from &ldquo;How do I expose this API to the model?&rdquo; to &ldquo;What does the model actually need to help the user?&rdquo;</p>
<h2 id="three-design-principles-for-context-efficient-mcps">Three Design Principles for Context-Efficient MCPs</h2>
<p>One of the first things I wanted to do was check my budget overview - see my accounts, categories, and how I&rsquo;m tracking for the current month. A straightforward use case that any budgeting tool should support.</p>
<p>My initial thought was to create tools that directly wrapped the YNAB API endpoints. Let&rsquo;s take the accounts endpoint as an example. Here&rsquo;s what the API returns:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span>  <span style="color:#cba6f7">&#34;data&#34;</span>: {
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">&#34;accounts&#34;</span>: [
</span></span><span style="display:flex;"><span>      {
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;id&#34;</span>: <span style="color:#a6e3a1">&#34;3fa85f64-5717-4562-b3fc-2c963f66afa6&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;name&#34;</span>: <span style="color:#a6e3a1">&#34;Checking Account&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;type&#34;</span>: <span style="color:#a6e3a1">&#34;checking&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;on_budget&#34;</span>: <span style="color:#fab387">true</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;closed&#34;</span>: <span style="color:#fab387">false</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;note&#34;</span>: <span style="color:#a6e3a1">&#34;Primary checking&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;balance&#34;</span>: <span style="color:#fab387">125000</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;cleared_balance&#34;</span>: <span style="color:#fab387">120000</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;uncleared_balance&#34;</span>: <span style="color:#fab387">5000</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;transfer_payee_id&#34;</span>: <span style="color:#a6e3a1">&#34;...&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;direct_import_linked&#34;</span>: <span style="color:#fab387">true</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;direct_import_in_error&#34;</span>: <span style="color:#fab387">false</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;last_reconciled_at&#34;</span>: <span style="color:#a6e3a1">&#34;2025-11-05T18:27:20.140Z&#34;</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;debt_original_balance&#34;</span>: <span style="color:#fab387">0</span>,
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;debt_interest_rates&#34;</span>: {},
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;debt_minimum_payments&#34;</span>: {},
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;debt_escrow_amounts&#34;</span>: {},
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">&#34;deleted&#34;</span>: <span style="color:#fab387">false</span>
</span></span><span style="display:flex;"><span>      }
</span></span><span style="display:flex;"><span>      <span style="color:#6c7086;font-style:italic">// ... 46 more accounts
</span></span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"></span>    ]
</span></span><span style="display:flex;"><span>  }
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div><p>For <strong>my 47 accounts</strong>, this API response contains <strong>18 fields per account</strong>. Many of these fields are irrelevant for typical budget questions:</p>
<ul>
<li>Debt interest rates and minimum payments (only relevant for debt accounts)</li>
<li>Direct import status (internal system state)</li>
<li>Cleared vs uncleared balance breakdown (too granular for overview)</li>
<li>Transfer payee IDs (internal references)</li>
<li>Last reconciliation date (accounting detail)</li>
</ul>
<p>When you&rsquo;re answering budget questions, these internal bookkeeping details just create noise.</p>
<p>A naive wrapper would return all 18 fields × 47 accounts = <strong>9,960 tokens</strong>.</p>
<p>But here&rsquo;s what I actually need to answer questions like &ldquo;What&rsquo;s my checking account balance?&rdquo; or &ldquo;How much do I have across all accounts?&rdquo;:</p>
<ul>
<li>Account name</li>
<li>Account type</li>
<li>Balance</li>
<li>Whether it&rsquo;s on-budget</li>
<li>Whether it&rsquo;s closed</li>
</ul>
<p>That&rsquo;s it. Just 6 fields. Here&rsquo;s the filtered implementation:</p>
<p><strong>For accounts (47 accounts):</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># src/ynab_mcp/ynab_client.py</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">async</span> <span style="color:#cba6f7">def</span> <span style="color:#89b4fa">get_accounts</span>(<span style="color:#89dceb">self</span>, budget_id: <span style="color:#89dceb">str</span>) <span style="color:#89dceb;font-weight:bold">-&gt;</span> <span style="color:#89dceb">list</span>[<span style="color:#89dceb">dict</span>[<span style="color:#89dceb">str</span>, Any]]:
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;&#34;&#34;Get all accounts for a budget.&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    response <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#89dceb">self</span><span style="color:#89dceb;font-weight:bold">.</span>client<span style="color:#89dceb;font-weight:bold">.</span>accounts<span style="color:#89dceb;font-weight:bold">.</span>get_accounts(budget_id)
</span></span><span style="display:flex;"><span>    accounts <span style="color:#89dceb;font-weight:bold">=</span> []
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">for</span> account <span style="color:#89dceb;font-weight:bold">in</span> response<span style="color:#89dceb;font-weight:bold">.</span>data<span style="color:#89dceb;font-weight:bold">.</span>accounts:
</span></span><span style="display:flex;"><span>        <span style="color:#6c7086;font-style:italic"># Skip deleted accounts entirely</span>
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">if</span> account<span style="color:#89dceb;font-weight:bold">.</span>deleted:
</span></span><span style="display:flex;"><span>            <span style="color:#cba6f7">continue</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6c7086;font-style:italic"># Return only the fields the model actually needs</span>
</span></span><span style="display:flex;"><span>        accounts<span style="color:#89dceb;font-weight:bold">.</span>append({
</span></span><span style="display:flex;"><span>            <span style="color:#a6e3a1">&#34;id&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>id,
</span></span><span style="display:flex;"><span>            <span style="color:#a6e3a1">&#34;name&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>name,
</span></span><span style="display:flex;"><span>            <span style="color:#a6e3a1">&#34;type&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>type,
</span></span><span style="display:flex;"><span>            <span style="color:#a6e3a1">&#34;on_budget&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>on_budget,
</span></span><span style="display:flex;"><span>            <span style="color:#a6e3a1">&#34;closed&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>closed,
</span></span><span style="display:flex;"><span>            <span style="color:#a6e3a1">&#34;balance&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>balance <span style="color:#89dceb;font-weight:bold">/</span> <span style="color:#fab387">1000</span> <span style="color:#cba6f7">if</span> account<span style="color:#89dceb;font-weight:bold">.</span>balance <span style="color:#cba6f7">else</span> <span style="color:#fab387">0</span>,
</span></span><span style="display:flex;"><span>        })
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">return</span> accounts
</span></span></code></pre></div><p><strong>Filtered approach (6 essential fields):</strong> 3,451 tokens for 47 accounts</p>
<p><strong>Reduction: 65.4%</strong> by removing 12 unnecessary fields that the model doesn&rsquo;t need for typical budget questions.</p>
<h3 id="the-full-budget-overview">The Full Budget Overview</h3>
<p>Of course, checking accounts is just one part of viewing your budget. A complete budget overview requires three tool calls:</p>
<p><strong>Naive approach (no filtering):</strong></p>
<ul>
<li>Accounts (all fields): 9,960 tokens</li>
<li>Categories (all, including hidden): 12,445 tokens</li>
<li>Monthly summary (estimated): ~8,000 tokens</li>
<li><strong>Total: ~30,405 tokens</strong></li>
</ul>
<p><strong>Context-efficient approach:</strong></p>
<ul>
<li>Accounts (filtered): 3,451 tokens</li>
<li>Categories (visible only): 8,620 tokens</li>
<li>Monthly budget summary: 6,808 tokens</li>
<li><strong>Total: ~18,879 tokens</strong></li>
</ul>
<p><strong>Workflow reduction: 38% fewer tokens</strong> for the same functionality - a complete picture of my budget for the current month. This leaves plenty of room in Claude&rsquo;s context window for conversation history, reasoning, and additional tool calls.</p>
<p>But it&rsquo;s not just about saving tokens. By filtering out unnecessary data, I&rsquo;m also <strong>improving model accuracy</strong>. When Claude doesn&rsquo;t see fields like <code>debt_escrow_amounts</code> or <code>direct_import_in_error</code>, it can&rsquo;t get confused by them or incorrectly incorporate them into calculations. The model focuses on exactly what matters: account names, balances, and budget status.</p>
<p>The key insight: <strong>the model doesn&rsquo;t need to see all the data to work with it effectively</strong>. In fact, it works <em>better</em> with less data. By doing the filtering in the tool layer, I kept the context window lean while maintaining full functionality and improving reliability.</p>
<p>The filtering techniques I&rsquo;d learned from accounts and categories immediately paid off when I tackled the next challenge: helping Claude categorize uncategorized transactions.</p>
<h2 id="categorizing-transactions">Categorizing Transactions</h2>
<p>One of the workflows I wanted help with was taking uncategorized transactions and suggesting categories for them. This would help me ensure that my budget was accurate and up-to-date. To do this I created a tool that would fetch all uncategorized transactions from YNAB and get a list of the categories available in the budget.</p>
<p>On the first pass I noticed something unusual. Claude was recommending categories that were hidden in my budget - old categories I no longer used but hadn&rsquo;t deleted. I quickly realized that the YNAB REST API doesn&rsquo;t provide a way to exclude hidden categories in the API call itself. That meant I had two options:</p>
<ol>
<li>Include instructions in my tool description telling Claude to ignore hidden categories</li>
<li>Filter them out in the tool code before returning data to Claude</li>
</ol>
<p>I chose option 2. Here&rsquo;s why: every instruction you add to a tool description consumes context window. More importantly, it puts the burden of filtering on the model. This means we&rsquo;re knowingly giving the model more data than it needs, forcing it to do more work and potentially allowing the model to make mistakes.</p>
<p>It&rsquo;s worth emphasizing: tool descriptions themselves consume context tokens. In my Claude Code session, MCP tool definitions consumed 47.9k tokens (24% of the context window) before any tools were even called. Every line of documentation, every parameter description, every usage instruction adds up. This creates a tension: you want clear, helpful descriptions, but verbose documentation eats into the space available for actual tool results and conversation.</p>
<p>The solution isn&rsquo;t to write minimal descriptions—clarity matters. Instead, keep descriptions focused on what the tool does and its parameters, and handle behavior rules (like &ldquo;ignore hidden items&rdquo;) in your implementation code rather than in lengthy instructions. Instead, I implemented filtering at the tool layer:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#cba6f7">def</span> <span style="color:#89b4fa">_filter_categories</span>(
</span></span><span style="display:flex;"><span>    <span style="color:#89dceb">self</span>, categories: <span style="color:#89dceb">list</span>[<span style="color:#89dceb">dict</span>[<span style="color:#89dceb">str</span>, Any]], include_hidden: <span style="color:#89dceb">bool</span> <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#fab387">False</span>
</span></span><span style="display:flex;"><span>) <span style="color:#89dceb;font-weight:bold">-&gt;</span> <span style="color:#89dceb">list</span>[<span style="color:#89dceb">dict</span>[<span style="color:#89dceb">str</span>, Any]]:
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;&#34;&#34;Filter categories to exclude hidden/deleted ones by default.&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    filtered <span style="color:#89dceb;font-weight:bold">=</span> []
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">for</span> category <span style="color:#89dceb;font-weight:bold">in</span> categories:
</span></span><span style="display:flex;"><span>        <span style="color:#6c7086;font-style:italic"># Always skip deleted categories</span>
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">if</span> category<span style="color:#89dceb;font-weight:bold">.</span>get(<span style="color:#a6e3a1">&#34;deleted&#34;</span>):
</span></span><span style="display:flex;"><span>            <span style="color:#cba6f7">continue</span>
</span></span><span style="display:flex;"><span>        <span style="color:#6c7086;font-style:italic"># Skip hidden categories unless explicitly included</span>
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">if</span> <span style="color:#89dceb;font-weight:bold">not</span> include_hidden <span style="color:#89dceb;font-weight:bold">and</span> category<span style="color:#89dceb;font-weight:bold">.</span>get(<span style="color:#a6e3a1">&#34;hidden&#34;</span>):
</span></span><span style="display:flex;"><span>            <span style="color:#cba6f7">continue</span>
</span></span><span style="display:flex;"><span>        filtered<span style="color:#89dceb;font-weight:bold">.</span>append(category)
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">return</span> filtered
</span></span></code></pre></div><p>Then I exposed this as a parameter in the tool:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#89b4fa;font-weight:bold">@mcp.tool</span>()
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">async</span> <span style="color:#cba6f7">def</span> <span style="color:#89b4fa">get_categories</span>(budget_id: <span style="color:#89dceb">str</span>, include_hidden: <span style="color:#89dceb">bool</span> <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#fab387">False</span>) <span style="color:#89dceb;font-weight:bold">-&gt;</span> <span style="color:#89dceb">str</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;&#34;&#34;Get all categories for a budget.
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    Args:
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">        budget_id: The ID of the budget (use &#39;last-used&#39; for default budget)
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">        include_hidden: Include hidden categories and groups (default: False)
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    Returns:
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">        JSON string with category groups and categories
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    &#34;&#34;&#34;</span>
</span></span></code></pre></div><p>This approach gave me the best of both worlds: by default, Claude only sees active categories, but I can still access hidden categories when needed (like when I wanted to identify old balances that needed cleanup). This saved <strong>30.7% of tokens</strong> per category list request (from 12,445 to 8,620 tokens by filtering out 69 hidden categories) while improving accuracy.</p>
<h3 id="design-principle-1-filter-at-the-source">Design Principle #1: Filter at the Source</h3>
<p>Do data filtering in your tool code rather than relying on prompt instructions. This saves tokens and prevents errors.</p>
<h2 id="historical-spending-analysis">Historical Spending Analysis</h2>
<p>Next, I wanted to be able to ask questions about my historical spending. Questions like &ldquo;How much did I spend on groceries last month?&rdquo; or &ldquo;What categories am I overspending in?&rdquo; would be really useful.</p>
<p>My first instinct was to create a tool that fetched all transactions for a date range and let Claude analyze them. But I quickly realized this approach had serious problems.</p>
<p>To illustrate, let me show you the real numbers for my 2024 transactions:</p>
<ul>
<li><strong>Total transactions in 2024:</strong> 3,456</li>
<li><strong>Fields per transaction:</strong> 14 (id, date, amount, memo, account_name, payee_name, category_name, cleared, approved, etc.)</li>
<li><strong>Average per transaction:</strong> ~216 tokens</li>
</ul>
<p>Now extrapolate this to common queries:</p>
<ul>
<li><strong>1 month</strong> of transactions (~284 txns): ~61,368 tokens</li>
<li><strong>3 months</strong> of transactions (~852 txns): ~184,106 tokens</li>
<li><strong>6 months</strong> of transactions (~1,704 txns): ~368,213 tokens</li>
<li><strong>1 year</strong> of transactions (3,456 txns): <strong>~746,800 tokens</strong></li>
</ul>
<p>The problems with this approach:</p>
<ol>
<li><strong>Token usage</strong>: A full year query would consume <strong>746,800 tokens</strong> - that&rsquo;s 3.7x larger than Claude Sonnet 4.5&rsquo;s entire 200k context window! You literally couldn&rsquo;t fit a year of transactions in a single request.</li>
<li><strong>Speed</strong>: Transferring and parsing thousands of transaction objects is slow</li>
<li><strong>Analysis burden</strong>: Claude would need to group, sum, and calculate averages on raw data</li>
<li><strong>Wasted context</strong>: Most of those 14 fields per transaction aren&rsquo;t relevant to &ldquo;how much did I spend?&rdquo;</li>
</ol>
<p>Even a modest 3-month query would consume 184k tokens - using 92% of the available context window just for raw transaction data. This leaves almost no room for Claude to maintain conversation history, reason about the results, or make additional tool calls to answer follow-up questions.</p>
<p>Instead, I realized these calculations could easily be handled in the tool layer. Here&rsquo;s what the aggregation logic looks like:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#cba6f7">async</span> <span style="color:#cba6f7">def</span> <span style="color:#89b4fa">get_category_spending_summary</span>(
</span></span><span style="display:flex;"><span>    <span style="color:#89dceb">self</span>,
</span></span><span style="display:flex;"><span>    budget_id: <span style="color:#89dceb">str</span>,
</span></span><span style="display:flex;"><span>    category_id: <span style="color:#89dceb">str</span>,
</span></span><span style="display:flex;"><span>    since_date: <span style="color:#89dceb">str</span>,
</span></span><span style="display:flex;"><span>    until_date: <span style="color:#89dceb">str</span>,
</span></span><span style="display:flex;"><span>    include_graph: <span style="color:#89dceb">bool</span> <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#fab387">True</span>,
</span></span><span style="display:flex;"><span>) <span style="color:#89dceb;font-weight:bold">-&gt;</span> <span style="color:#89dceb">dict</span>[<span style="color:#89dceb">str</span>, Any]:
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;&#34;&#34;Get spending summary for a category over a date range.&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6c7086;font-style:italic"># Fetch transactions from API</span>
</span></span><span style="display:flex;"><span>    result <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#cba6f7">await</span> <span style="color:#89dceb">self</span><span style="color:#89dceb;font-weight:bold">.</span>_make_request_with_retry(<span style="color:#a6e3a1">&#34;get&#34;</span>, url, params<span style="color:#89dceb;font-weight:bold">=</span>params)
</span></span><span style="display:flex;"><span>    txn_data <span style="color:#89dceb;font-weight:bold">=</span> result[<span style="color:#a6e3a1">&#34;data&#34;</span>][<span style="color:#a6e3a1">&#34;transactions&#34;</span>]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6c7086;font-style:italic"># Aggregate in tool layer</span>
</span></span><span style="display:flex;"><span>    total_spent <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#fab387">0</span>
</span></span><span style="display:flex;"><span>    transaction_count <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#fab387">0</span>
</span></span><span style="display:flex;"><span>    monthly_totals <span style="color:#89dceb;font-weight:bold">=</span> {}
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">for</span> txn <span style="color:#89dceb;font-weight:bold">in</span> txn_data:
</span></span><span style="display:flex;"><span>        <span style="color:#6c7086;font-style:italic"># Filter by category and date range</span>
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">if</span> txn<span style="color:#89dceb;font-weight:bold">.</span>get(<span style="color:#a6e3a1">&#34;category_id&#34;</span>) <span style="color:#89dceb;font-weight:bold">!=</span> category_id:
</span></span><span style="display:flex;"><span>            <span style="color:#cba6f7">continue</span>
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">if</span> txn[<span style="color:#a6e3a1">&#34;date&#34;</span>] <span style="color:#89dceb;font-weight:bold">&gt;</span> until_date:
</span></span><span style="display:flex;"><span>            <span style="color:#cba6f7">continue</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6c7086;font-style:italic"># YNAB stores amounts in milliunits (e.g., $125.00 = 125000)</span>
</span></span><span style="display:flex;"><span>        amount <span style="color:#89dceb;font-weight:bold">=</span> txn[<span style="color:#a6e3a1">&#34;amount&#34;</span>] <span style="color:#89dceb;font-weight:bold">/</span> <span style="color:#fab387">1000</span> <span style="color:#cba6f7">if</span> txn<span style="color:#89dceb;font-weight:bold">.</span>get(<span style="color:#a6e3a1">&#34;amount&#34;</span>) <span style="color:#cba6f7">else</span> <span style="color:#fab387">0</span>
</span></span><span style="display:flex;"><span>        total_spent <span style="color:#89dceb;font-weight:bold">+=</span> amount
</span></span><span style="display:flex;"><span>        transaction_count <span style="color:#89dceb;font-weight:bold">+=</span> <span style="color:#fab387">1</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>        <span style="color:#6c7086;font-style:italic"># Build monthly breakdown</span>
</span></span><span style="display:flex;"><span>        month_key <span style="color:#89dceb;font-weight:bold">=</span> txn[<span style="color:#a6e3a1">&#34;date&#34;</span>][:<span style="color:#fab387">7</span>]  <span style="color:#6c7086;font-style:italic"># YYYY-MM</span>
</span></span><span style="display:flex;"><span>        <span style="color:#cba6f7">if</span> month_key <span style="color:#89dceb;font-weight:bold">not</span> <span style="color:#89dceb;font-weight:bold">in</span> monthly_totals:
</span></span><span style="display:flex;"><span>            monthly_totals[month_key] <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#fab387">0</span>
</span></span><span style="display:flex;"><span>        monthly_totals[month_key] <span style="color:#89dceb;font-weight:bold">+=</span> amount
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6c7086;font-style:italic"># Calculate average per month</span>
</span></span><span style="display:flex;"><span>    num_months <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#89dceb">len</span>(monthly_totals) <span style="color:#cba6f7">if</span> monthly_totals <span style="color:#cba6f7">else</span> <span style="color:#fab387">1</span>
</span></span><span style="display:flex;"><span>    average_per_month <span style="color:#89dceb;font-weight:bold">=</span> total_spent <span style="color:#89dceb;font-weight:bold">/</span> num_months <span style="color:#cba6f7">if</span> num_months <span style="color:#89dceb;font-weight:bold">&gt;</span> <span style="color:#fab387">0</span> <span style="color:#cba6f7">else</span> <span style="color:#fab387">0</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6c7086;font-style:italic"># Return only the summary</span>
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">return</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#a6e3a1">&#34;category_id&#34;</span>: category_id,
</span></span><span style="display:flex;"><span>        <span style="color:#a6e3a1">&#34;date_range&#34;</span>: {<span style="color:#a6e3a1">&#34;start&#34;</span>: since_date, <span style="color:#a6e3a1">&#34;end&#34;</span>: until_date},
</span></span><span style="display:flex;"><span>        <span style="color:#a6e3a1">&#34;total_spent&#34;</span>: total_spent,
</span></span><span style="display:flex;"><span>        <span style="color:#a6e3a1">&#34;transaction_count&#34;</span>: transaction_count,
</span></span><span style="display:flex;"><span>        <span style="color:#a6e3a1">&#34;average_per_month&#34;</span>: average_per_month,
</span></span><span style="display:flex;"><span>        <span style="color:#a6e3a1">&#34;monthly_breakdown&#34;</span>: [
</span></span><span style="display:flex;"><span>            {<span style="color:#a6e3a1">&#34;month&#34;</span>: month, <span style="color:#a6e3a1">&#34;spent&#34;</span>: amount}
</span></span><span style="display:flex;"><span>            <span style="color:#cba6f7">for</span> month, amount <span style="color:#89dceb;font-weight:bold">in</span> <span style="color:#89dceb">sorted</span>(monthly_totals<span style="color:#89dceb;font-weight:bold">.</span>items())
</span></span><span style="display:flex;"><span>        ],
</span></span><span style="display:flex;"><span>    }
</span></span></code></pre></div><p>The impact was dramatic. Let me show you a real example from my implementation:</p>
<p><strong>Scenario:</strong> Analyze 6 months of spending for a single category (22 transactions)</p>
<ul>
<li><strong>Before (returning raw transactions)</strong>: 4,890 tokens</li>
<li><strong>After (pre-aggregated summary)</strong>: 262 tokens</li>
<li><strong>Reduction</strong>: 94.6%</li>
</ul>
<p>The aggregated response includes:</p>
<ul>
<li>Total spent</li>
<li>Average per month</li>
<li>Transaction count</li>
<li>Monthly breakdown (array of {month, amount} objects)</li>
</ul>
<p>That&rsquo;s everything Claude needs to answer questions like &ldquo;Am I spending more on groceries this year than last year?&rdquo; without having to receive, parse, and aggregate dozens of individual transaction records.</p>
<h3 id="design-principle-2-pre-aggregate-data">Design Principle #2: Pre-Aggregate Data</h3>
<p>Pre-calculate aggregations, summaries, and statistics in your tool code. Return insights, not raw data. This keeps your context window lean while still giving the model everything it needs to help users.</p>
<p>While filtering and aggregation solved the token efficiency problem, I ran into a different challenge: the API itself had limitations.</p>
<h2 id="building-tools-for-unsupported-actions">Building Tools for Unsupported Actions</h2>
<p>While working on the workflow to have Claude help me categorize transactions, I realized I needed a way to split a transaction across multiple categories. For example, a Costco purchase might include $150 of groceries, $50 of household items, and $30 of gas.</p>
<p>Unfortunately, the YNAB API does not provide a way to convert an existing transaction into a split transaction. The API only allows creating NEW transactions with splits. This was a real limitation - but it presented an opportunity to think creatively about tool design.</p>
<p>Instead of telling users &ldquo;sorry, the API doesn&rsquo;t support this,&rdquo; I created a tool that works within the API&rsquo;s constraints:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#89b4fa;font-weight:bold">@mcp.tool</span>()
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">async</span> <span style="color:#cba6f7">def</span> <span style="color:#89b4fa">prepare_split_for_matching</span>(
</span></span><span style="display:flex;"><span>    budget_id: <span style="color:#89dceb">str</span>,
</span></span><span style="display:flex;"><span>    transaction_id: <span style="color:#89dceb">str</span>,
</span></span><span style="display:flex;"><span>    subtransactions: <span style="color:#89dceb">str</span>,
</span></span><span style="display:flex;"><span>) <span style="color:#89dceb;font-weight:bold">-&gt;</span> <span style="color:#89dceb">str</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;&#34;&#34;Prepare a split transaction to match with an existing imported transaction.
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    This tool fetches an existing transaction&#39;s details and creates a new UNAPPROVED split
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    transaction with the same date, amount, account, and payee. You can then manually match
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    them together in the YNAB web or mobile UI.
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    Workflow:
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">        1. This tool fetches the existing transaction details
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">        2. Creates a new unapproved split transaction with those details
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">        3. You manually match them in the YNAB UI
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">        4. YNAB merges them into one split transaction
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    Note:
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">        - The new split is created as UNAPPROVED for manual matching
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">        - The sum of subtransaction amounts should equal the original transaction amount
</span></span></span><span style="display:flex;"><span><span style="color:#a6e3a1">    &#34;&#34;&#34;</span>
</span></span></code></pre></div><p>The implementation:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#cba6f7">async</span> <span style="color:#cba6f7">def</span> <span style="color:#89b4fa">prepare_split_for_matching</span>(
</span></span><span style="display:flex;"><span>    <span style="color:#89dceb">self</span>,
</span></span><span style="display:flex;"><span>    budget_id: <span style="color:#89dceb">str</span>,
</span></span><span style="display:flex;"><span>    transaction_id: <span style="color:#89dceb">str</span>,
</span></span><span style="display:flex;"><span>    subtransactions: <span style="color:#89dceb">list</span>[<span style="color:#89dceb">dict</span>[<span style="color:#89dceb">str</span>, Any]],
</span></span><span style="display:flex;"><span>) <span style="color:#89dceb;font-weight:bold">-&gt;</span> <span style="color:#89dceb">dict</span>[<span style="color:#89dceb">str</span>, Any]:
</span></span><span style="display:flex;"><span>    <span style="color:#6c7086;font-style:italic"># Fetch the original transaction details</span>
</span></span><span style="display:flex;"><span>    original <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#cba6f7">await</span> <span style="color:#89dceb">self</span><span style="color:#89dceb;font-weight:bold">.</span>get_transaction(budget_id, transaction_id)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#6c7086;font-style:italic"># Create a new split transaction with the same details but unapproved</span>
</span></span><span style="display:flex;"><span>    new_split <span style="color:#89dceb;font-weight:bold">=</span> <span style="color:#cba6f7">await</span> <span style="color:#89dceb">self</span><span style="color:#89dceb;font-weight:bold">.</span>create_split_transaction(
</span></span><span style="display:flex;"><span>        budget_id<span style="color:#89dceb;font-weight:bold">=</span>budget_id,
</span></span><span style="display:flex;"><span>        account_id<span style="color:#89dceb;font-weight:bold">=</span>original[<span style="color:#a6e3a1">&#34;account_id&#34;</span>],
</span></span><span style="display:flex;"><span>        date<span style="color:#89dceb;font-weight:bold">=</span>original[<span style="color:#a6e3a1">&#34;date&#34;</span>],
</span></span><span style="display:flex;"><span>        amount<span style="color:#89dceb;font-weight:bold">=</span>original[<span style="color:#a6e3a1">&#34;amount&#34;</span>],
</span></span><span style="display:flex;"><span>        subtransactions<span style="color:#89dceb;font-weight:bold">=</span>subtransactions,
</span></span><span style="display:flex;"><span>        payee_name<span style="color:#89dceb;font-weight:bold">=</span>original<span style="color:#89dceb;font-weight:bold">.</span>get(<span style="color:#a6e3a1">&#34;payee_name&#34;</span>),
</span></span><span style="display:flex;"><span>        memo<span style="color:#89dceb;font-weight:bold">=</span>original<span style="color:#89dceb;font-weight:bold">.</span>get(<span style="color:#a6e3a1">&#34;memo&#34;</span>),
</span></span><span style="display:flex;"><span>        cleared<span style="color:#89dceb;font-weight:bold">=</span>original<span style="color:#89dceb;font-weight:bold">.</span>get(<span style="color:#a6e3a1">&#34;cleared&#34;</span>, <span style="color:#a6e3a1">&#34;uncleared&#34;</span>),
</span></span><span style="display:flex;"><span>        approved<span style="color:#89dceb;font-weight:bold">=</span><span style="color:#fab387">False</span>,  <span style="color:#6c7086;font-style:italic"># Key: create as unapproved for matching</span>
</span></span><span style="display:flex;"><span>    )
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">return</span> {
</span></span><span style="display:flex;"><span>        <span style="color:#a6e3a1">&#34;original_transaction&#34;</span>: original,
</span></span><span style="display:flex;"><span>        <span style="color:#a6e3a1">&#34;new_split_transaction&#34;</span>: new_split,
</span></span><span style="display:flex;"><span>        <span style="color:#a6e3a1">&#34;instructions&#34;</span>: (
</span></span><span style="display:flex;"><span>            <span style="color:#a6e3a1">&#34;A new unapproved split transaction has been created. &#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#a6e3a1">&#34;Go to YNAB and manually match these two transactions together. &#34;</span>
</span></span><span style="display:flex;"><span>            <span style="color:#a6e3a1">&#34;Look for the match indicator in the YNAB UI.&#34;</span>
</span></span><span style="display:flex;"><span>        ),
</span></span><span style="display:flex;"><span>    }
</span></span></code></pre></div><p>This solution works because YNAB has a built-in &ldquo;matching&rdquo; feature where it can merge a manually-entered transaction with an imported one. By creating the split as unapproved, YNAB&rsquo;s UI will detect the duplicate and offer to match them. When you accept the match, the imported transaction becomes a proper split transaction.</p>
<p>Is this ideal? No - I&rsquo;d prefer a direct API endpoint. But it&rsquo;s a pragmatic solution that works within the constraints of the underlying platform while still providing value to the user.</p>
<h3 id="design-principle-3-work-within-api-constraints">Design Principle #3: Work Within API Constraints</h3>
<p>When an API doesn&rsquo;t support something directly, look for workflows that combine available operations to achieve the desired outcome.</p>
<h2 id="real-world-token-reduction-results">Real-World Token Reduction Results</h2>
<p>Here&rsquo;s a summary of the optimizations applied across the YNAB MCP, with real measured token counts:</p>
<table>
  <thead>
      <tr>
          <th>Tool/Workflow</th>
          <th>Naive Approach</th>
          <th>Optimized</th>
          <th>Reduction</th>
          <th>Technique Applied</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Accounts</strong></td>
          <td>9,960 tokens (18 fields)</td>
          <td>3,451 tokens (6 fields)</td>
          <td>65.4%</td>
          <td>Field filtering</td>
      </tr>
      <tr>
          <td><strong>Categories</strong></td>
          <td>12,445 tokens (all)</td>
          <td>8,620 tokens (visible)</td>
          <td>30.7%</td>
          <td>Default filtering + opt-in</td>
      </tr>
      <tr>
          <td><strong>Budget Overview</strong></td>
          <td>~30,405 tokens</td>
          <td>~18,879 tokens</td>
          <td>38%</td>
          <td>Combined filtering</td>
      </tr>
      <tr>
          <td><strong>Category Spending (6mo)</strong></td>
          <td>4,890 tokens (raw txns)</td>
          <td>262 tokens (summary)</td>
          <td>94.6%</td>
          <td>Pre-aggregation</td>
      </tr>
      <tr>
          <td><strong>Year of Transactions</strong></td>
          <td>746,800 tokens</td>
          <td>262 tokens</td>
          <td>99.96%</td>
          <td>Pre-aggregation</td>
      </tr>
  </tbody>
</table>
<h3 id="when-to-apply-each-technique">When to Apply Each Technique</h3>
<p><strong>Field Filtering</strong> (accounts, categories)</p>
<ul>
<li>Use when: API returns many fields, but only subset is needed for common queries</li>
<li>Savings: Moderate (30-65%)</li>
<li>Complexity: Low - simple field selection</li>
<li>Example: Remove debt details from non-debt accounts, skip internal IDs like <code>transfer_payee_id</code>, drop reconciliation timestamps
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Instead of returning all 18 fields, return only what matters</span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">return</span> {
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;id&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>id,
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;name&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>name,
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;balance&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>balance <span style="color:#89dceb;font-weight:bold">/</span> <span style="color:#fab387">1000</span>,
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;on_budget&#34;</span>: account<span style="color:#89dceb;font-weight:bold">.</span>on_budget
</span></span><span style="display:flex;"><span>}
</span></span></code></pre></div></li>
</ul>
<p><strong>Default Filtering with Parameters</strong> (hidden categories)</p>
<ul>
<li>Use when: Some data is rarely needed but occasionally useful</li>
<li>Savings: Moderate (30-40%)</li>
<li>Complexity: Low - add optional boolean parameter</li>
<li>Example: Hide deleted/archived items by default, expose via <code>include_deleted</code> flag</li>
</ul>
<p><strong>Pre-aggregation</strong> (spending analysis)</p>
<ul>
<li>Use when: Model would need to compute summaries from raw data</li>
<li>Savings: High (90-99%)</li>
<li>Complexity: Medium - requires aggregation logic</li>
<li>Example: Return monthly totals instead of individual transactions</li>
</ul>
<p><strong>Creative Workarounds</strong> (split transactions)</p>
<ul>
<li>Use when: API doesn&rsquo;t support desired operation directly</li>
<li>Savings: Enables new functionality (not about tokens)</li>
<li>Complexity: High - requires understanding API constraints</li>
<li>Example: Multi-step workflows that achieve goals indirectly</li>
</ul>
<h3 id="how-to-measure-your-own-mcp">How to Measure Your Own MCP</h3>
<p>You might be wondering how I got these specific numbers. Here&rsquo;s my methodology—and how you can apply it to your own MCPs.</p>
<p>All the numbers in this post came from real measurements. Here&rsquo;s how I validated the optimizations, and how you can do the same for your MCP:</p>
<h4 id="1-set-up-token-counting">1. Set Up Token Counting</h4>
<p>Install <a href="https://github.com/openai/tiktoken">tiktoken</a>, OpenAI&rsquo;s tokenizer library (Claude uses a similar tokenization scheme):</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>pip install tiktoken
</span></span></code></pre></div><p>Create a helper function to count tokens:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#94e2d5">import</span> <span style="color:#fab387">tiktoken</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#cba6f7">def</span> <span style="color:#89b4fa">count_tokens</span>(text: <span style="color:#89dceb">str</span>) <span style="color:#89dceb;font-weight:bold">-&gt;</span> <span style="color:#89dceb">int</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#a6e3a1">&#34;&#34;&#34;Count tokens using tiktoken&#39;s cl100k_base encoding.&#34;&#34;&#34;</span>
</span></span><span style="display:flex;"><span>    encoding <span style="color:#89dceb;font-weight:bold">=</span> tiktoken<span style="color:#89dceb;font-weight:bold">.</span>get_encoding(<span style="color:#a6e3a1">&#34;cl100k_base&#34;</span>)
</span></span><span style="display:flex;"><span>    <span style="color:#cba6f7">return</span> <span style="color:#89dceb">len</span>(encoding<span style="color:#89dceb;font-weight:bold">.</span>encode(text))
</span></span></code></pre></div><h4 id="2-measure-api-responses">2. Measure API Responses</h4>
<p>Create a script that fetches data both ways and compares:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#94e2d5">import</span> <span style="color:#fab387">json</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Get raw API response</span>
</span></span><span style="display:flex;"><span>raw_response <span style="color:#89dceb;font-weight:bold">=</span> api<span style="color:#89dceb;font-weight:bold">.</span>get_accounts(budget_id)
</span></span><span style="display:flex;"><span>raw_json <span style="color:#89dceb;font-weight:bold">=</span> json<span style="color:#89dceb;font-weight:bold">.</span>dumps(raw_response, indent<span style="color:#89dceb;font-weight:bold">=</span><span style="color:#fab387">2</span>)
</span></span><span style="display:flex;"><span>raw_tokens <span style="color:#89dceb;font-weight:bold">=</span> count_tokens(raw_json)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Get your filtered response</span>
</span></span><span style="display:flex;"><span>filtered_response <span style="color:#89dceb;font-weight:bold">=</span> your_mcp_tool<span style="color:#89dceb;font-weight:bold">.</span>get_accounts(budget_id)
</span></span><span style="display:flex;"><span>filtered_json <span style="color:#89dceb;font-weight:bold">=</span> json<span style="color:#89dceb;font-weight:bold">.</span>dumps(filtered_response, indent<span style="color:#89dceb;font-weight:bold">=</span><span style="color:#fab387">2</span>)
</span></span><span style="display:flex;"><span>filtered_tokens <span style="color:#89dceb;font-weight:bold">=</span> count_tokens(filtered_json)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Compare</span>
</span></span><span style="display:flex;"><span>reduction <span style="color:#89dceb;font-weight:bold">=</span> ((raw_tokens <span style="color:#89dceb;font-weight:bold">-</span> filtered_tokens) <span style="color:#89dceb;font-weight:bold">/</span> raw_tokens) <span style="color:#89dceb;font-weight:bold">*</span> <span style="color:#fab387">100</span>
</span></span><span style="display:flex;"><span><span style="color:#89dceb">print</span>(<span style="color:#f38ba8">f</span><span style="color:#a6e3a1">&#34;Raw: </span><span style="color:#a6e3a1">{</span>raw_tokens<span style="color:#a6e3a1">:</span><span style="color:#a6e3a1">,</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1"> tokens&#34;</span>)
</span></span><span style="display:flex;"><span><span style="color:#89dceb">print</span>(<span style="color:#f38ba8">f</span><span style="color:#a6e3a1">&#34;Filtered: </span><span style="color:#a6e3a1">{</span>filtered_tokens<span style="color:#a6e3a1">:</span><span style="color:#a6e3a1">,</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1"> tokens&#34;</span>)
</span></span><span style="display:flex;"><span><span style="color:#89dceb">print</span>(<span style="color:#f38ba8">f</span><span style="color:#a6e3a1">&#34;Reduction: </span><span style="color:#a6e3a1">{</span>reduction<span style="color:#a6e3a1">:</span><span style="color:#a6e3a1">.1f</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1">%&#34;</span>)
</span></span></code></pre></div><h4 id="3-test-real-workflows">3. Test Real Workflows</h4>
<p>Don&rsquo;t just measure individual tools - measure complete workflows users will perform:</p>
<div class="highlight"><pre tabindex="0" style="color:#cdd6f4;background-color:#1e1e2e;-moz-tab-size:2;-o-tab-size:2;tab-size:2;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#6c7086;font-style:italic"># Simulate a budget overview workflow</span>
</span></span><span style="display:flex;"><span>accounts <span style="color:#89dceb;font-weight:bold">=</span> your_mcp<span style="color:#89dceb;font-weight:bold">.</span>get_accounts(budget_id)
</span></span><span style="display:flex;"><span>categories <span style="color:#89dceb;font-weight:bold">=</span> your_mcp<span style="color:#89dceb;font-weight:bold">.</span>get_categories(budget_id, include_hidden<span style="color:#89dceb;font-weight:bold">=</span><span style="color:#fab387">False</span>)
</span></span><span style="display:flex;"><span>summary <span style="color:#89dceb;font-weight:bold">=</span> your_mcp<span style="color:#89dceb;font-weight:bold">.</span>get_budget_summary(budget_id, current_month)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>total_tokens <span style="color:#89dceb;font-weight:bold">=</span> (
</span></span><span style="display:flex;"><span>    count_tokens(json<span style="color:#89dceb;font-weight:bold">.</span>dumps(accounts)) <span style="color:#89dceb;font-weight:bold">+</span>
</span></span><span style="display:flex;"><span>    count_tokens(json<span style="color:#89dceb;font-weight:bold">.</span>dumps(categories)) <span style="color:#89dceb;font-weight:bold">+</span>
</span></span><span style="display:flex;"><span>    count_tokens(json<span style="color:#89dceb;font-weight:bold">.</span>dumps(summary))
</span></span><span style="display:flex;"><span>)
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#89dceb">print</span>(<span style="color:#f38ba8">f</span><span style="color:#a6e3a1">&#34;Budget overview workflow: </span><span style="color:#a6e3a1">{</span>total_tokens<span style="color:#a6e3a1">:</span><span style="color:#a6e3a1">,</span><span style="color:#a6e3a1">}</span><span style="color:#a6e3a1"> tokens&#34;</span>)
</span></span></code></pre></div><p>This revealed that my budget overview workflow uses ~19k tokens - well within Claude&rsquo;s <a href="https://docs.anthropic.com/en/docs/about-claude/models">200k context window</a> with room to spare.</p>
<h4 id="4-watch-for-runtime-warnings">4. Watch for Runtime Warnings</h4>
<p>Claude Code will warn you when tool responses exceed ~10k tokens. If you see these warnings frequently, it&rsquo;s a signal to investigate:</p>
<blockquote>
<p>⚠️ Large MCP response (~12.5k tokens), this can fill up context quickly.</p></blockquote>
<p>These warnings helped me identify which tools needed optimization.</p>
<h4 id="5-validate-correctness">5. Validate Correctness</h4>
<p>Token reduction means nothing if your tools return incorrect data. Always verify:</p>
<ul>
<li>Does the filtered data answer the user&rsquo;s questions?</li>
<li>Are calculations accurate? (spot-check aggregations against raw data)</li>
<li>Does pagination work correctly? (ensure you&rsquo;re not computing on partial datasets)</li>
</ul>
<p>The goal isn&rsquo;t to minimize tokens at all costs - it&rsquo;s to return exactly what the model needs, nothing more, nothing less.</p>
<h2 id="limitations-and-trade-offs">Limitations and Trade-offs</h2>
<p>This context-efficient approach works well for YNAB, but it&rsquo;s not without limitations. Before applying these patterns to your own MCP, consider these trade-offs:</p>
<p><strong>Pre-aggregation assumes query patterns.</strong> If users ask questions that need raw transaction details (like &ldquo;show me the memo for my largest grocery purchase&rdquo;), the aggregated data won&rsquo;t help. You&rsquo;ll need additional tools that return raw data for those cases.</p>
<p><strong>Filtering loses flexibility.</strong> By removing fields, you can&rsquo;t answer questions that need those fields without making additional API calls. The key is knowing your use cases. For budget analysis and categorization, these trade-offs are worth it. For transaction-level forensics, you might need different tools.</p>
<p><strong>Caching complexity.</strong> Pre-computed aggregations need invalidation strategies when data changes. If your underlying data updates frequently, you&rsquo;ll need to think carefully about cache freshness and when to recompute.</p>
<p><strong>Development overhead.</strong> Writing aggregation logic and filtering code is more work than simple pass-through wrappers. You&rsquo;re trading implementation time for runtime efficiency. For frequently-used tools, this is usually worth it.</p>
<p>The goal isn&rsquo;t to optimize every tool to the extreme. It&rsquo;s to identify the high-impact workflows—the ones users will perform repeatedly—and optimize those intelligently.</p>
<h2 id="key-takeaways">Key Takeaways</h2>
<ul>
<li><strong>Context is expensive</strong>: In real Claude Code sessions, MCP tool definitions can consume 24% of the 200k context window before any tools are called</li>
<li><strong>Measure everything</strong>: Use tiktoken to count tokens on API responses before and after optimization</li>
<li><strong>Filter proactively</strong>: Removing 12 unnecessary fields from 47 accounts reduced tokens by 65.4%</li>
<li><strong>Aggregate strategically</strong>: Pre-computing spending summaries reduced a 6-month query by 94.6%</li>
<li><strong>Design for the model</strong>: Ask &ldquo;What does the model need?&rdquo; not &ldquo;What does the API provide?&rdquo;</li>
<li><strong>Default to minimal data</strong>: Return only what&rsquo;s necessary by default, with optional parameters for edge cases</li>
<li><strong>Validate with real workflows</strong>: Test complete user flows, not just individual tools, to understand cumulative token impact</li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>Building a context-efficient MCP requires a mindset shift: design for what the model needs, not just what the API provides. The three principles I learned—filter at the source, compute in tools, and work within constraints creatively—apply to any MCP wrapping an external API.</p>
<p>Through careful design, the YNAB MCP achieves dramatic efficiency:</p>
<ul>
<li>Budget overview: 38% reduction (30k → 19k tokens)</li>
<li>Spending analysis: 94.6% reduction (4.9k → 262 tokens)</li>
<li>Category filtering: 30.7% reduction (12.4k → 8.6k tokens)</li>
</ul>
<p>These aren&rsquo;t theoretical—they&rsquo;re real measurements from actual usage. Context is expensive, and every token matters when you&rsquo;re building tools for multi-turn conversations.</p>
<p>If you&rsquo;re building an MCP, start by asking: &ldquo;What does the model actually need to answer the user&rsquo;s question?&rdquo; Not &ldquo;What does the API return?&rdquo; That mindset shift makes all the difference.</p>
<h2 id="further-reading">Further Reading</h2>
<p>If you want to dive deeper into MCP development and context optimization:</p>
<ul>
<li><a href="https://spec.modelcontextprotocol.io/">Model Context Protocol Specification</a> - Official MCP spec and documentation</li>
<li><a href="https://www.anthropic.com/engineering/code-execution-with-mcp">Code Execution with MCP</a> - Anthropic&rsquo;s engineering blog on building with MCP</li>
<li><a href="https://youtu.be/-uW5-TaVXu4?si=GBCXG3Q5QcfEdvnJ">Most devs don&rsquo;t understand how context windows work</a> - Deep dive into context window fundamentals and practical management strategies</li>
<li><a href="https://api.ynab.com/">YNAB API Documentation</a> - The API this MCP wraps</li>
<li><a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview">Anthropic&rsquo;s Prompt Engineering Guide</a> - Understanding context windows and token efficiency</li>
</ul>
<p>The full YNAB MCP implementation is available at <a href="https://github.com/dgalarza/ynab-mcp-dgalarza">github.com/dgalarza/ynab-mcp-dgalarza</a> if you want to dive deeper into the code. I&rsquo;d love to hear about the context optimization techniques you&rsquo;ve discovered in your own MCP projects.</p>
<hr>
<p>Building MCP servers or designing Claude Code workflows for your team? I help engineers get this right from the start. <a href="/claude-code/">Learn more</a>.</p>
]]></content:encoded></item></channel></rss>