Agent-Architecture on Damian Galarza | Software Engineering & AI Consulting

Governing AI Agents Without Killing Them: What Actually Works in Production

Damian Galarza — Wed, 22 Apr 2026 00:00:00 -0400

Agentic AI governance for CTOs argues governance needs to come before deployment, not after. The strategic frame is right about what’s at stake: organizational accountability, observability, and tool access. But the solutions assume organizational machinery most early-stage teams don’t have. A two-person startup running a multi-agent system doesn’t need a RACI. It needs a guardrail processor that fails loudly. Leigh names the tension directly: overly restrictive governance drives experimentation underground. Governance that lives in code resolves it — lightweight enough for a seed-stage team, enforceable enough for a regulator.

The piece covers six governance gaps. This post is about three where code-level enforcement most obviously beats policy — tool access, observability, and human-in-the-loop. Cost visibility, shadow AI, and accountability chains are real concerns that deserve their own treatment.

I’ve spent the last several months building a multi-agent AI assistant that runs my consulting business: CRM, email, calendar, invoicing, content pipeline, Slack across two workspaces. Before that, years building software in regulated healthcare, including work on 510(k)-cleared medical device software where every system decision needed an audit trail. “We’ll add logging later” was never an acceptable answer when a regulator could ask to reconstruct any action the system took. That mindset shapes how I think about agent governance. The three patterns below are ones I’ve either hit in production or narrowly avoided. Each is a place where code-level governance beats policy-level governance for a team that can’t afford a review board.

Tool Sprawl Widens Your Blast Radius

MCP server sprawl is named in the original frame as a source of expanded blast radius. The same governance principle lives one layer down, inside the agent’s tool definition: every tool an agent can access is a tool it could misuse. An agent with access to email, calendar, invoicing, CRM, and file operations has a blast radius that spans your entire business. A single prompt injection or hallucination can reach tools the agent should never touch. The principle is least privilege at the agent level, not the system level. Each agent should have access to exactly the tools it needs for its role, and nothing else.

What makes this worse is that tool sprawl also degrades the agent’s ability to do its job. An agent with 40 tools when it regularly uses 8 faces two compounding problems. Every tool definition consumes context window tokens, space the model can’t use for reasoning about the actual task. And the model has to select the right tool from a larger set, which increases the odds of misselection. I’ve watched agents pick a vaguely-similar tool over the correct one because the tool list was too long for the model to evaluate carefully. The governance risk and the performance cost come from the same root cause: too many tools in one agent’s definition.

In my system, I run a multi-agent architecture where a supervisor delegates to domain-specific agents. I built the first version with a supervisor that had access to everything — why not let it figure out what to use? It worked in demos. In production, both problems showed up immediately: the supervisor’s blast radius spanned the entire system, and the model wasted reasoning capacity navigating tools it didn’t need.

Here’s how I structure it instead. Each agent gets a scoped tool set:

// Each agent declares only the tools it needs
const relayAgent = new Agent({
  name: "relay",
  instructions: relayInstructions,
  model: LOCAL_MODEL_LARGE_THINKING,
  tools: {
    // Email tools only - no CRM, no calendar, no invoicing
    scanInbox,
    readEmail,
    readEmailThread,
    labelEmail,
    archiveEmail,
    composeEmail,
    draftEmail,
    replyToEmail,
  },
});

const tempoAgent = new Agent({
  name: "tempo",
  instructions: tempoInstructions,
  model: FAST_MODEL,
  tools: {
    // Calendar tools only - no email, no CRM, no invoicing
    listCalendarEvents,
    getCalendarEvent,
    createCalendarEvent,
    updateCalendarEvent,
    deleteCalendarEvent,
    findCalendarFreeBusy,
  },
});

The email agent can’t touch the calendar. The calendar agent can’t read emails. The invoicing agent can’t send Slack messages to the shared workspace. These boundaries aren’t documentation. They’re structural. An agent literally cannot call a tool it doesn’t have.

But tool scoping alone isn’t enough. Some tools within an agent’s set need additional constraints. My email agent has tools for composing and sending emails. The model can hallucinate plausible-looking recipient addresses, fabricate domains, or construct emails to addresses that don’t exist. Instructions alone won’t prevent this because the model can reason past them.

Rather than trusting the model’s judgment, I enforce this at the framework level using Mastra’s output processors:

// A Mastra processor that blocks emails to fabricated addresses
import type { ProcessOutputStepArgs, Processor } from "@mastra/core/processors";

const SEND_TOOLS = new Set(["compose-email", "reply-to-email"]);

export class EmailSendGuardrailProcessor implements Processor<"email-send-guardrail"> {
  readonly id = "email-send-guardrail" as const;

  processOutputStep({ toolCalls, abort, messages }: ProcessOutputStepArgs) {
    if (!toolCalls?.length) return messages;

    for (const tc of toolCalls) {
      if (!SEND_TOOLS.has(tc.toolName)) continue;

      const to = (tc.args as { to?: string })?.to;
      if (!to) continue;

      // Block obviously fabricated or placeholder recipients
      if (/(@example\.com|@test\.com|@placeholder\.)/.test(to) || !to.includes("@")) {
        abort(
          `The recipient "${to}" looks like a guessed address. Look up the contact in the CRM first. Never fabricate email addresses.`,
          { retry: true },
        );
        return messages;
      }
    }

    return messages;
  }
}

Processors inspect the step’s generated tool calls and can abort execution with a retry hint when something violates a hard rule. If the model hallucinates a recipient address, the guardrail aborts with a message telling the agent to look up the contact in the CRM first. The address never reaches the send tool. No approval card, no prompt-based workaround.

The same principle applies to trust boundaries across workspaces. I run two Slack integrations: one for my private workspace, one for a shared community. The community-facing agent has no browser access, no credential vault, no file system. That’s not a policy document. It’s a different agent with a different tool set, pointed at a different Slack app.

The pattern: Don’t govern tool access with policies that agents might ignore. Remove the tools from the agent’s definition entirely. Governance you can’t violate is better than governance you promise to follow.

Beyond Tracing: Structured Decision Logs for Agent Governance

You cannot govern what you cannot see. Modern tracing tools like Phoenix Arize, Langfuse, and Mastra Studio show you the full request/response cycle: inputs, outputs, tool calls, latency, and the model’s reasoning process. I use Phoenix Arize extensively. It’s the first place I look when debugging why an agent picked the wrong tool, hallucinated a parameter, or took an unexpected path.

Tracing is essential, but it answers a specific class of questions: what happened inside the model’s reasoning. Governance needs a second layer: structured decision logs that answer what the system decided, what confidence it had, and whether the outcome was correct in your domain context.

This is familiar territory if you’ve worked in regulated environments. In healthcare software, particularly anything touching the 510(k) pathway for Software as a Medical Device (SaMD), you don’t just log that a record was modified. You log who modified it, when, what the previous value was, and what rule authorized the change. Every action must be reconstructable because a regulator will ask. Agent governance has the same shape, even outside healthcare. The stakeholder asking “why did the agent do that?” isn’t debugging model behavior. They’re asking whether the outcome was correct given the business rules, and they need a trail that answers that question without ambiguity.

Here’s the distinction in practice. When my email triage agent archives a message, I can see in Phoenix Arize exactly what the model received and how it reasoned about the classification. That’s useful for debugging why the model chose “archive” over “escalate.” But when I need to answer “show me every email that was auto-archived from my inbox last week, what confidence level each had, and which ruleset applied,” I need structured logs that are queryable independent of the tracing system.

That means capturing the agent’s decision context in a structured, queryable format:

// Each triage decision captures full context for audit
interface TriageDecision {
  messageId: string;
  subject: string;
  from: string;
  classification: "archive" | "act" | "digest" | "escalate";
  confidence: number;
  mode: "conservative" | "full";   // Which ruleset applied
  reason: string;                  // Why the agent chose this
  actionTaken: string;             // What actually happened
  labels: string[];                // What labels were applied
  timestamp: string;
}
// Persisted to the database, queryable from the dashboard

The triage system runs two different modes depending on whose inbox it’s scanning. My inbox gets conservative mode (only auto-archives high-confidence machine-generated noise like billing receipts and marketing emails). The AI assistant’s inbox gets full mode (four classification categories, auto-archives everything after classification). That modal distinction matters for governance because the blast radius is different. Archiving a marketing email from the assistant’s inbox is low stakes. Archiving something from my inbox that I hadn’t seen yet is a different conversation.

Beyond decision logging, I track errors with fingerprint deduplication. Every catch block writes to a structured error table with module, message, and context. A background health monitor runs every five minutes, detects stale processes, and escalates to the LLM for analysis when rule-based detection isn’t enough. The dashboard surfaces all of this: health banners, error pages with filters, and stale-session badges that go amber after 10 minutes and red after 30.

None of this came from a governance framework. It came from an earlier system I built called Tracewell AI, where agents generated design inputs from source material in a regulated context. Every derivation had to be auditable: “show me every design input, which sources the agent pulled from, and its confidence at the time.” No trace format could answer that, and I wasn’t using one. I built a structured audit log because compliance required it, not because debugging demanded it. That’s where I learned the distinction: traces show what the model reasoned; structured logs show what the system decided, under which rules, with what confidence.

The pattern: Tracing gives you deep visibility into model behavior. Use it. But for governance, pair it with structured decision logs that capture domain-specific context: what was decided, what confidence level, what ruleset applied, and what action the system took. Make both queryable, and make sure someone is actually reviewing them.

Human-in-the-Loop: The Checkpoint That Actually Works

The most important insight about human-in-the-loop is the “rubber-stamp trap.” Adding human review to every agent decision is a common starting point. In practice, reviewers get overwhelmed, start rubber-stamping, and the checkpoint becomes theater.

This isn’t just a theory. Anthropic recently published research on Claude Code’s auto-accept mode that quantifies the problem: users were approving 93% of permission prompts. That’s not review. That’s muscle memory. Their solution was to replace blanket approval with a tiered system where a model-based classifier evaluates risk and only escalates actions that warrant human attention. The classifier uses a two-stage pipeline (fast filter, then chain-of-thought reasoning) and catches overeager behavior, prompt injection, scope escalation, and honest mistakes while letting routine actions through without friction.

The same principle applies to agent systems. The solution isn’t removing human review. It’s being precise about where it adds value and what context the reviewer needs to make a real decision.

My system uses a tiered approach. Low-risk actions (reading emails, looking up calendar events, searching the CRM) happen without approval. The agent just does them. High-risk actions go through explicit approval gates using Mastra’s agent approval system. When a tool is tagged with requireApproval: true, Mastra pauses execution at the framework level before the tool runs. The stream emits an approval event with the tool name and arguments, and the tool only executes after an explicit approveToolCall(). This is framework-enforced, not prompt-based, so the model can’t reason its way past the gate.

The key design choice is what “approval” looks like. A generic “Agent wants to perform an action. Approve?” dialog is useless. The reviewer has no context, so they either rubber-stamp or block everything out of caution. Both outcomes are governance failures.

Here’s what a real approval checkpoint looks like for my coding pipeline:

planning → risk assessment → low risk? ──yes──→ auto-approved → executing
                                ↓ no
                          plan_review → approved → executing
                                ↑              ↓
                            revise ← request changes

The agent generates a plan. A risk assessor (two layers: deterministic heuristics for hard stops like DROP TABLE or .env modifications, plus an LLM classifier for everything else) evaluates the plan. Low-risk plans auto-approve and execute immediately. Medium and high-risk plans go to human review with the full plan visible, not just a yes/no prompt.

When I review a plan, I see exactly what the agent intends to do, which files it will touch, and why the risk assessor flagged it. I can approve, request changes (the agent revises and resubmits), or reject entirely. That’s a checkpoint with teeth. The reviewer has enough context to make a real judgment call, and the “request changes” path means the review isn’t binary.

For email, the approval is even more specific. When the agent wants to send an email, the approval card shows the full email: recipient, subject, body. I’m not approving “send an email.” I’m approving this specific email to this specific person. The context makes the checkpoint real instead of performative.

The less obvious lesson: the approval system itself can break in ways that look like it’s working. I discovered that tagging certain tools with requireApproval caused my supervisor agent to avoid delegating to the sub-agent entirely. The supervisor model saw that the delegation path was “approval-gated” and hallucinated reasons not to use it. The approval mechanism was technically present but functionally disabled because the model routed around it. I only caught this by checking the traces (see: observability matters).

The pattern: Approval checkpoints work when three conditions are met. The reviewer sees the full context of the action, not just a generic prompt. Low-risk actions bypass review entirely so the reviewer isn’t fatigued. And the system is monitored to ensure the approval path is actually being exercised, not silently avoided.

Governance as Code: Defense in Depth

Each of the previous patterns (tool scoping, guardrail processors, decision logs, approval gates) is a single layer. None of them is sufficient alone. The real value shows up when you stack them.

Take sending email as the running example. You’ve already seen the individual layers: the email guardrail processor that blocks fabricated recipients, and the approval gate that pauses execution for human review. Here’s how they combine with two additional layers into a defense-in-depth stack:

Tool API design forces an explicit sender parameter (“emma” | “damian”) with no default. The caller must deliberately choose which account sends.
Guardrail processor blocks fabricated or placeholder recipients before the tool executes. Hard abort, no workaround.
Framework-level approval gate (requireApproval: true) pauses execution and surfaces the full email for review: recipient, subject, body, sender.
Client-level enforcement in sendEmail() requires an explicit userEmail argument. No fallback, no default. If the parameter is missing, it throws.

Each layer is independent. If the model hallucinates a recipient, Layer 2 catches it. If it tries to send without approval, Layer 3 blocks it. If somehow the tool args are malformed, Layer 4 throws. A bypass at one layer doesn’t compromise the others.

That’s what governance as code means. The constraints are enforced by the system, verified by tests, and visible in the codebase, not buried in a Confluence page. This is one of the dimensions that separate agent-ready codebases from ones that break under real workloads. Frameworks like Mastra give you the primitives: guardrail processors for hard rules, approval gates for human review. Your job is to wire them into a layered defense that matches your risk profile.

What I’d Tell a CTO to Do Monday Morning

If you’re leading a team that’s deploying agents, here’s where to start:

Audit your tool surfaces. For every agent in production, list the tools it has access to and ask: does this agent need all of these? Every unnecessary tool is expanded blast radius and wasted context window. Scope them down. You’ll likely see better tool selection as a side effect.

Add structured decision logging to your highest-stakes agent action. You probably already have LLM tracing. Pick the one action where you’d need to explain “why did the agent do that?” to a stakeholder, and add structured logs that capture the decision context: inputs, classification, confidence, action taken. Make it queryable from your dashboard, not buried in trace spans.

Pick your highest-risk action and build a real checkpoint. Not a generic approval dialog. An approval flow that shows the reviewer the full context of the action. Frameworks like Mastra provide the primitives. One real checkpoint is worth more than twenty rubber-stamp prompts.

Move one governance rule from documentation to code. Find a constraint that’s currently a line in a README or a team agreement. Encode it as a guardrail processor, a test, a structural boundary. Something that fails loudly when violated rather than depending on an agent reading and following instructions.

These are afternoon-sized tasks, not quarterly initiatives. That’s the point. The original frame is right that governance has to come before agents ship at scale. But “before” doesn’t mean organizational review boards. It means constraints in code that ship with the agent. Each of these gives you something concrete: a tighter tool surface, a queryable decision log, an approval checkpoint that someone actually uses, or a constraint that enforces itself without depending on an agent’s good behavior.

If you want a faster read on where you stand, I built a companion Agent Governance Scorecard — 30 yes/no questions across the four dimensions above. It takes about ten minutes and tells you which layer to fix first.

These aren’t theoretical patterns. They’re the same techniques I apply when working with early-stage teams to formalize their agent architecture.

Most early-stage teams I talk to have agents in production and governance that’s still catching up. The gap between “it works” and “I can explain why it did that” is where real risk lives — and it’s where investors, partners, and your first enterprise customer will start asking questions. If that sounds familiar, book a free 30-minute strategy call. I’ll walk through your agent architecture, identify the highest-risk tool surfaces, and give you a prioritized action plan: what to lock down first, what can wait, and which of these patterns fits your system. No slide decks. Just a concrete roadmap you can start executing the same week.

The Observability Layer Your AI Agent Is Missing

Damian Galarza — Tue, 14 Apr 2026 14:00:00 +0000

Logs tell you what happened. Traces tell you why. The three layers of agent observability, and where silent failures actually live.

I walk through a real production failure from my own system. My business ops agent confidently reported a completed task it had silently failed. Logs were clean. The dashboard was green. A single trace showed exactly why. This is Part 2 of the Agent Quality series, based on Google’s Agent Quality white paper.

AI Agent Evals: The 4 Layers Most Teams Skip

Damian Galarza — Tue, 07 Apr 2026 14:00:34 +0000

Most teams evaluate AI agents by vibes. Here are the four layers of evals you actually need to ship agents with confidence.

I walk through the eval stack I use on real agent projects — from unit-level prompt checks up through end-to-end trajectory scoring — and explain where each layer catches different classes of failure. If you’re building agents for production and wondering why regressions keep slipping through, this is the framework to borrow.

I Gave My AI Agent Access to My Second Brain

Damian Galarza — Tue, 31 Mar 2026 14:00:00 +0000

What happens when you wire an AI agent directly into your Obsidian vault? Here’s the setup I use to turn notes into real leverage.

I walk through how I connected my second brain to an AI agent, the structure that makes retrieval actually work, and the workflows this unlocks — from daily briefings to content drafting off years of captured thinking.

How I Built a Personal AI Assistant with Mastra

Damian Galarza — Fri, 06 Mar 2026 00:00:00 -0500

Most “AI assistants” are just chatbots with a context window. Ask them something, get an answer. Ask again later, and they have no idea who you are.

That’s not an assistant. That’s a search engine with a personality.

I wanted something different. I wanted an agent that:

Researches people before my meetings
Reminds me to follow up
Remembers context across conversations
Acts on its own when events happen

So I built one. Here’s how it works.

The Goal

I meet with a lot of people: founders, potential clients, partners. Before each meeting, I want to know who I’m talking to: their background, their company, what they’ve been working on. After each meeting, I follow up at the right time.

Doing this manually doesn’t scale. I needed an agent that handles it for me.

Architecture Overview

The system has five main components:

Communication: Slack as the primary interface, built on a platform-agnostic SDK
Tools: Composable functions for research, scheduling, and data retrieval
Memory: Four types (message history, semantic recall, working memory, observational) so the agent remembers across conversations
Webhooks: Event-driven triggers that let the agent react to Cal.com bookings automatically
Task Scheduling: Time-delayed task execution for follow-ups and reminders

Each component does one thing. Together, they form an agent that acts, remembers, and follows up.

1. Tools: What the Agent Can Do

Tools are the agent’s capabilities. Each tool is a function the agent can call when it decides to use it.

// src/mastra/tools/cal-com.ts
import { createTool } from "@mastra/core/tools";
import { z } from "zod";

export const getUpcomingEvents = createTool({
  id: "getUpcomingEvents",
  description: "Get upcoming meetings from Cal.com for the specified date range",
  inputSchema: z.object({
    startDate: z.string(),
    endDate: z.string(),
  }),
  outputSchema: z.object({
    title: z.string(),
    startTime: z.string(),
    endTime: z.string(),
    attendees: z.array(z.string()),
  }),
  execute: async ({ startDate, endDate }) => {
    const events = await fetchCalComEvents(startDate, endDate);
    return events.map((e) => ({
      title: e.title,
      startTime: e.start,
      endTime: e.end,
      attendees: e.attendees.map((a) => a.email),
    }));
  },
});

The tool description is the contract. The agent reads the description and decides when to call the tool. Clear descriptions = reliable tool usage.

I defined tools for Cal.com (scheduling), Exa (web research), and Slack (messaging):

Tool	What it does
`getUpcomingEvents`	Fetch meetings from Cal.com
`searchVault`	Search my contacts and notes
`researchPerson`	Research a person/company via Exa
`postToSlack`	Post to the assistant channel
`scheduleTask`	Schedule a follow-up task

Each tool does one thing. Simple, composable functions the agent can combine.

2. Memory: The Differentiator

Most agents have no memory. Ask them something, they answer. Ask again later, they start fresh.

In How AI Agents Remember Things, I covered the conceptual taxonomy: episodic memory for events and interactions, semantic memory for stable facts and preferences. Mastra maps those concepts to four concrete memory types you configure when building an agent.

Mastra Type	What it does	Conceptual equivalent
Message history	Keeps recent messages in context within a conversation	Episodic (in-session)
Semantic recall	Retrieves relevant messages from past conversations by meaning	Episodic (cross-session)
Working memory	Persistent structured data: your name, preferences, goals	Semantic (stable facts)
Observational memory	Background summarization to keep the context window small over time	Session compaction

Message history (lastMessages) is the short-term layer. It keeps the last N messages in context so the agent can follow the conversation thread. The agent can reference something you said three messages ago without you repeating it. Ten messages works well for most conversational flows.

Semantic recall is the long-term retrieval layer. It uses vector embeddings to search across all past conversations by meaning, not keywords. When you say “remember that thing about the Cal.com integration,” the agent encodes your query into a vector and finds the closest matches from past messages. You configure topK (how many matches to retrieve) and messageRange (how many surrounding messages to include for context). I used LibSQL for the vector store and FastEmbed for local embeddings, so the entire pipeline runs without external API calls.

Working memory is the persistent layer. It’s a structured scratchpad the agent updates over time as it learns about you. Unlike message history and semantic recall, which store raw messages, working memory stores distilled facts: your name, your role, your preferences. You define a template, and the agent fills it in as it picks up information from conversations. This is what makes the agent feel like it knows you, even in a brand new thread.

Observational memory uses background Observer and Reflector agents to maintain a dense observation log that replaces raw message history as it grows. I haven’t wired this up yet, but it solves the context window problem: as conversations get long, you can’t keep everything in context. Observational memory compresses it down without losing the long-term thread.

Here’s how the first three are configured:

// src/mastra/agents/meeting-assistant.ts
import { Agent } from '@mastra/core/agent';
import { Memory } from '@mastra/memory';
import { LibSQLVector } from '@mastra/libsql';
import { fastembed } from '@mastra/fastembed';

const memory = new Memory({
  // Vector store for semantic recall
  vector: new LibSQLVector({
    id: "memory-vector",
    url: "file:./mastra.db",
  }),

  // Local embedding model, no API key needed
  embedder: fastembed,

  options: {
    // Message history: keeps the last 10 messages in context
    lastMessages: 10,

    // Semantic recall: searches past conversations by meaning
    semanticRecall: {
      topK: 3,
      messageRange: 2,
    },

    // Working memory: persistent user profile the agent updates over time
    workingMemory: {
      enabled: true,
      template: `# User Profile
- Name:
- Role:
- Company:
- Communication style:
- Meeting prep preferences:
`,
    },
  },
});

export const meetingAssistant = new Agent({
  name: 'MeetingAssistant',
  model: 'openai/gpt-4.1',
  memory,
});

Mastra handles storage, retrieval, and injection into the agent’s context automatically. You configure the types declaratively; the framework does the rest.

3. Slack Integration

Communication happens through Slack via the Chat SDK, a platform-agnostic interface for bot communication.

// src/chat.ts
import { Chat } from "chat";
import { createSlackAdapter } from "@chat-adapter/slack";
import { meetingAssistant } from "./mastra/agents/meeting-assistant";

export const bot = new Chat({
  userName: "meeting-assistant",
  adapters: {
    slack: createSlackAdapter(),
  },
});

bot.onNewMention(async (thread, message) => {
  await thread.subscribe();
  await thread.startTyping();

  const result = await meetingAssistant.generate(message.text, {
    memory: {
      thread: thread.id,
      resource: "user",
    },
  });
  await thread.post(result.text);
});

Two things happening here. First, onNewMention fires when someone @mentions the bot in Slack. Second, memory.thread scopes messages to the specific Slack thread, while memory.resource uses a fixed ID so working memory (your profile) is shared across all threads.

4. Webhooks: Reacting to Events

The agent needs to know when a meeting is booked. That’s where webhooks come in.

// src/mastra/index.ts
import { Mastra } from "@mastra/core";
import { registerApiRoute } from "@mastra/core/server";

export const mastra = new Mastra({
  server: {
    apiRoutes: [
      registerApiRoute("/webhooks/cal", {
        method: "POST",
        handler: async (c) => {
          const payload = await c.req.json();
          const triggerEvent = payload.triggerEvent;

          if (triggerEvent !== "BOOKING_CREATED") {
            return c.json({ ok: true, skipped: true });
          }

          const attendee = payload.payload?.attendees?.[0];
          const channel = bot.channel(`slack:${process.env.SLACK_CHANNEL_ID}`);

          // Post immediately, then research asynchronously
          channel.post(`Researching *${attendee.name}* for upcoming meeting...`).then(async (sent) => {
            const threadId = `slack:${channelId}:${sent.id}`;

            const prompt = [
              `I have a meeting coming up with ${attendee.name} (${attendee.email}).`,
              `Event: ${payload.payload.title}`,
              `Time: ${payload.payload.startTime}`,
              `Research this person and give me a concise meeting brief.`,
            ].join("\n");

            const result = await meetingAssistant.generate(prompt);
            await slack.postMessage(threadId, { markdown: result.text });
          });

          return c.json({ ok: true });
        },
      }),
    ],
  },
});

The webhook receives the Cal.com booking payload, immediately posts a “researching” message to Slack, then kicks off the agent to research and post the brief. This way Cal.com doesn’t time out waiting for the research to complete.

5. Task Scheduling: Time-Delayed Actions

After a meeting, the agent should follow up. That requires scheduling a task for later execution.

// src/scheduler.ts
export async function scheduleTask(
  name: string,
  type: string,
  scheduledFor: string,
  payload: Record<string, unknown>,
) {
  await db.insert(scheduledTasks).values({
    name,
    type,
    scheduledFor,
    payload: JSON.stringify(payload),
  });
}

The scheduler polls every 30 seconds for due tasks, marks them as running, executes the handler, then marks them complete or failed. Simple, reliable, no external dependencies.

registerTaskHandler("follow-up", async (payload) => {
  const { threadId, message } = payload as { threadId: string; message: string };
  const slack = bot.getAdapter("slack");
  await slack.postMessage(threadId, { markdown: message });
});

When the meeting ends, the webhook handler schedules a follow-up task. Thirty seconds after the scheduled end time, the handler fires and posts to the Slack thread: “The meeting should be wrapping up! How did it go?”

The Mental Model

Here’s how I think about agent architecture now:

Communication is the interface. Slack, Telegram, or any messaging platform. It’s just how the user talks to the agent.

Tools are the agent’s capabilities. Each tool should do one thing well. The description is the contract. Write it clearly, or the agent won’t know when to use it.

Memory is the differentiator. Message history for in-session continuity, semantic recall for cross-session retrieval, working memory for persistent user facts. Most agents fail because they only implement one or none.

Webhooks make it proactive. Without external triggers, the agent only acts when asked. With webhooks, it can act on events: bookings, form submissions, anything.

Task scheduling closes the loop. The agent doesn’t just respond. It reminds, follows up, checks in. Time-delayed actions turn a reactive chatbot into a proactive assistant.

The Point

The point isn’t the code. It’s the architecture.

An agent that only chats doesn’t get you very far. An agent that integrates with your tools, remembers across conversations, reacts to events, and follows up on time is actually useful.

Mastra makes this easier than rolling your own. But the pattern works with any framework: give the agent tools, give it memory in layers, connect it to events, and let it act on time.

That’s an assistant. Everything else is a chatbot.

Code

The full implementation is on GitHub: github.com/dgalarza/mastra-meeting-assistant

Want Help Building This?

If you’re building AI agents into your workflow, whether it’s a personal assistant, an internal tool, or a customer-facing product, I can help.

Build Your Own AI Agent from Scratch (Mastra + TypeScript)

Damian Galarza — Thu, 05 Mar 2026 17:01:18 +0000

Learn to build your own AI agent that actually does work for you, not just answers questions.

In this video, I show you the core patterns behind every useful AI agent: tools, memory, webhooks, and scheduled tasks. We build a meeting prep assistant with Mastra (TypeScript) as the example, but the patterns apply to any agent you want to build.

How AI Agents Remember Things

Damian Galarza — Wed, 11 Feb 2026 00:00:35 +0000

How do AI agents remember things between sessions? Every agent forgets everything when a conversation ends, so how do the best ones seem to know you?

I break down the memory architecture behind real AI agents, using OpenClaw (an open-source AI assistant) as a reference implementation. You’ll see how LLM agents write, store, and load persistent memory using plain markdown files, and the four mechanisms that keep context across sessions, including context window management, bootstrap loading, and pre-compaction memory flush.

How OpenClaw Works: The Architecture Behind the 'Magic'

Damian Galarza — Wed, 04 Feb 2026 02:00:22 +0000

OpenClaw agent architecture explained: How autonomous AI agents like ClawdBot create the illusion of sentience using just inputs, queues, and a loop.

(Previously known as ClawdBot / MoltBot)

In this deep dive, I break down the 5 input types that power OpenClaw—Messages, Heartbeats, Crons, Hooks, and Webhooks—and show you the simple formula behind agents that seem to think on their own.