Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save johnlindquist/2b0c8369200683da79dd9ec1d98bdce1 to your computer and use it in GitHub Desktop.

Select an option

Save johnlindquist/2b0c8369200683da79dd9ec1d98bdce1 to your computer and use it in GitHub Desktop.
From Pull to Push — Event-Driven Knowledge Delivery for AI Agents

From Pull to Push — Event-Driven Knowledge Delivery for AI Agents

Documentation is a pull system. You go find what you need. A plugin is a push system. Knowledge finds you at the moment you need it. The difference is the same gap that separates polling an API every five seconds and subscribing to a webhook.


The Two Models of Knowledge

Every system for getting information to a consumer falls into one of two categories: pull or push. In a pull system, the consumer initiates the request. They know what they need, they know where it lives, and they go get it. In a push system, the producer initiates the delivery. The consumer declares what they care about, and relevant information arrives when it becomes available.

Both models are everywhere in software. REST APIs are pull. Webhooks are push. Database queries are pull. Change data capture is push. Manual documentation lookup is pull. IDE autocomplete is push. The entire trajectory of modern systems architecture has been a migration from pull to push wherever latency and relevance matter.

Knowledge delivery for AI coding agents is stuck in the pull era.

The dominant pattern today is the SKILL.md file: a static document that the agent loads into its context window and references as needed. Some systems front-load everything at session start. Others let the agent request specific files when it thinks it needs them. But in every case, the agent is the one pulling. It decides what to look up. It decides when. And it decides based on whatever incomplete model of the problem it has at that moment — a model that is, by definition, least informed at the point where it needs to make the decision.

This is the fundamental paradox of pull-based knowledge: the consumer must already understand the problem well enough to know what knowledge to request, but if they understood the problem that well, they might not need the knowledge in the first place.

A plugin flips the model. Knowledge is pushed to the agent based on observable events — file reads, file writes, bash commands, prompt text — and the agent never has to decide whether to look something up. The right context arrives at the right time because the system is listening, not because the agent is searching.

Publish-Subscribe as the Core Pattern

The architecture that makes this work is publish-subscribe. If you have built event-driven microservices, you already know the pattern. Publishers emit events. Subscribers declare interest in specific event types. A broker matches events to subscribers and delivers messages. The publisher does not know or care who is listening. The subscriber does not know or care who is publishing. The broker handles the routing.

In a Claude Code plugin, the mapping is direct:

Publisher    = Claude Code (emits tool-use events)
Broker       = Hook system (matches events to patterns)
Subscribers  = Skills (declare patterns they care about)
Messages     = Injected context (SKILL.md content)

When Claude invokes Read on a file matching app/api/chat/**, the hook system matches that path against every skill's declared pathPatterns. The ai-sdk skill has registered interest in that pattern. The hook delivers the AI SDK guidance into the conversation. The agent never asked for it. The agent did not even need to know it existed. The knowledge arrived because the event matched a subscription.

This is not a metaphor. It is a literal implementation of the publish-subscribe pattern, with one critical difference from most message brokers: the delivery is enriched and prioritized. Skills have priorities (4–8 range), byte budgets (18KB per injection), and deduplication guarantees. The broker does not just route messages — it curates them.

The Four Hook Events as Event Channels

The Vercel plugin subscribes to four distinct event channels, each capturing a different phase of the agent's work:

Channel 1: SessionStart — The Bootstrap Event

Fired once when a session begins (startup, resume, clear, compact). This is where the ecosystem knowledge graph is injected — the 52KB relational map of every Vercel product, library, and service. Think of it as the agent's orientation packet, delivered before it writes a single line of code.

In pub-sub terms, this is a broadcast topic. Every session gets the same foundational context regardless of what the project contains. But even here, the profiler hook (session-start-profiler.mts) analyzes the project's package.json, vercel.json, and config files to set VERCEL_PLUGIN_LIKELY_SKILLS, giving certain skills a +5 priority boost for the rest of the session. The broadcast is enriched with project-specific signal before the first tool call ever fires.

Channel 2: PreToolUse — The Interception Event

Fired every time the agent invokes Read, Edit, Write, or Bash. This is the primary injection channel and the one that most closely resembles a reactive event stream. The hook examines the tool invocation — which file is being read, which command is being run — and matches it against three pattern types:

  • Path patterns (globs): app/api/chat/** → ai-sdk skill
  • Bash patterns (regex): \bnpx\s+shadcn\b → shadcn skill
  • Import patterns (regex): @vercel/blob → vercel-storage skill

Multiple skills can match a single event. When they do, the broker ranks them by priority, applies the profiler boost, enforces the 3-skill cap and 18KB budget, and delivers the top matches. Lower-priority skills are dropped — not lost, just deferred until a future event where they rank higher.

The key property of this channel is that it fires at the moment of action. The agent is about to read a middleware file. Right now, before it processes the file contents, it receives the routing-middleware skill explaining how platform-level middleware works. The latency between "agent needs to know" and "agent does know" is measured in milliseconds, not in the agent's decision to search for documentation.

Channel 3: UserPromptSubmit — The Intent Event

Fired when the user types a prompt but before the agent begins processing it. This channel captures intent rather than action. The hook scores the prompt text against each skill's promptSignals:

  • Phrases (+6 each): exact substring matches like "ai gateway" or "cron job"
  • allOf (+4 per group): conjunction matches where every term must appear, like ["deploy", "production"]
  • anyOf (+1 each, capped at +2): soft signals like "openai" or "streaming"
  • noneOf (hard suppress): terms that indicate the skill is definitely irrelevant

This scoring system is a content-based router. The prompt "I want to add a cron job that calls my AI endpoint every hour" triggers the cron-jobs skill (phrase match: "cron job") and the ai-sdk skill (allOf: ["ai", "endpoint"], anyOf: "streaming" if present). Knowledge arrives before the agent writes its first tool call.

Channel 4: PostToolUse — The Validation Event

Fired after the agent writes or edits a file. This is the feedback channel — the event that enables push-based error correction. Each skill can declare validation rules:

validate:
  - pattern: "import.*from ['\"]@vercel/postgres['\"]"
    message: "Use @neondatabase/serverless — @vercel/postgres is sunset"
    severity: "error"

When the agent writes a file that matches a skill's path patterns, the validation rules run against the written content. Violations are pushed back as fix instructions. The agent did not ask "is this code correct?" The system told it — immediately, at write time, with specific remediation.

In event-driven terms, this is a saga pattern. The write event triggers a validation step, and if validation fails, a compensating action (the fix instruction) is emitted. The cycle is: write → validate → correct → write → validate → pass.

The Comparison: Pull vs. Push

Here is where the architectures diverge in measurable ways:

Dimension Pull (SKILL.md / docs) Push (Plugin hooks)
Latency Variable. Agent must decide to look, then find, then read. Seconds to never. Near-zero. Event fires → match → inject. Sub-100ms.
Precision Low. Agent guesses what to look up based on incomplete understanding. High. Pattern matching operates on the actual file path, command, or prompt.
Coverage Partial. Agent only pulls what it thinks it needs. Unknown unknowns remain unknown. Comprehensive. Every relevant skill that matches the event is considered.
Freshness Stale. SKILL.md is loaded once. Updates require re-reading. Live. Each event triggers a fresh match against current patterns.
Budget efficiency Poor. Front-loading wastes budget on irrelevant context. On-demand wastes time. Optimized. Priority ranking + byte budget + dedup = maximum relevance per token.
Error detection Absent. Static docs cannot validate code after it is written. Built-in. PostToolUse validation catches errors at write time.
Deduplication Manual. Agent must remember what it already read. Automatic. Atomic file claims ensure each skill injects once per session.
Multi-domain awareness Siloed. Each SKILL.md is independent. Orchestrated. Priority ranking considers the full skill universe.

The latency difference alone justifies the architecture. In a pull system, there is a window between "the agent needed this knowledge" and "the agent received this knowledge" that can stretch from seconds (if the agent eventually looks it up) to infinity (if the agent never realizes it should look). In a push system, that window collapses to the time it takes to match a glob pattern — which the Vercel plugin does with pre-compiled regexes from the manifest, typically in under 5ms.

But precision is the dimension that matters most. A pull-based agent deciding whether to look up the AI SDK docs must first recognize that the file it is reading is related to the AI SDK. That recognition requires the very knowledge that the lookup would provide. This is a chicken-and-egg problem that push-based delivery solves completely: the pattern app/api/chat/** does not require the agent to understand what an AI chat endpoint is. It just requires the file to be in the right directory.

The Architecture in Motion

Let us trace a concrete scenario. A developer asks: "Add a cron job that checks our AI usage every hour and stores the result in Redis."

Pull model (SKILL.md only):

  1. Agent reads the prompt. Recognizes "cron job" as a concept.
  2. Agent decides to look up cron documentation. (Maybe. Depends on training data.)
  3. Agent finds cron-jobs SKILL.md. Reads it. Learns about vercel.json schedule config.
  4. Agent starts writing the function. Uses @vercel/kv for Redis because that is what its training data says.
  5. Agent does not look up storage guidance because it thinks it knows Redis on Vercel.
  6. Agent ships sunset code. No validation catches it.

Push model (Plugin):

  1. UserPromptSubmit fires. Prompt scores: cron-jobs (phrase "cron job" +6), vercel-storage (allOf ["store", "redis"] +4, anyOf "redis" +1). Both skills inject.
  2. Agent now knows: cron jobs use vercel.json schedules with CRON_SECRET verification, AND @vercel/kv is sunset — use @upstash/redis instead.
  3. Agent creates vercel.json with the schedule. PreToolUse on the Write matches the cron-jobs skill (already injected, dedup skips). Good.
  4. Agent writes the function file. PreToolUse on the Write matches vercel-functions (path: api/**). Function guidance injects — including Fluid Compute and waitUntil patterns.
  5. Agent imports @upstash/redis. PostToolUse validation runs. No violations. Clean.
  6. If the agent had imported @vercel/kv instead, PostToolUse would have caught it immediately with: "Use @upstash/redis — @vercel/kv is sunset."

The push model delivered three skills (cron-jobs, vercel-storage, vercel-functions) without the agent requesting any of them. It caught a potential sunset-API error before it could propagate. And it did all of this within the flow of the agent's normal work, adding zero overhead to the agent's decision-making process.

Why This Matters Beyond a Single Plugin

The event-driven knowledge delivery pattern is not specific to the Vercel ecosystem. It is a general architecture for any domain where:

  1. The knowledge space is large. Forty-seven skills, hundreds of patterns, thousands of potential interactions. No agent can hold it all in context.

  2. Relevance is contextual. The right knowledge depends on what the agent is doing right now, not what it might do later. File paths, commands, and prompts are the highest-fidelity signals of current context.

  3. Errors are costly. Using a sunset API, misconfiguring a security feature, or choosing the wrong rendering strategy costs hours of debugging. Prevention at write time is orders of magnitude cheaper than detection after deployment.

  4. The ecosystem evolves. APIs change. Best practices shift. New products launch. A push system can update its skills independently without retraining the model or rewriting the SKILL.md.

Every major platform — AWS, GCP, Azure, Stripe, Shopify, Supabase — has the same problem: their ecosystem is too large, changes too fast, and intersects with too many other systems for any static document to be sufficient. The plugin model, with its event-driven knowledge delivery, is how you solve that problem.

The Pub-Sub Guarantees That Matter

A well-designed push system provides guarantees that pull systems cannot:

At-most-once delivery (dedup). Each skill injects at most once per session, enforced by atomic file claims (openSync with O_EXCL). The agent never receives duplicate context, even when multiple events match the same skill. This is the equivalent of idempotent message processing in a message queue.

Priority ordering. When more skills match than the budget allows, the highest-priority skills win. This is not arbitrary — priority reflects how critical the knowledge is for correctness. The ai-sdk skill at priority 8 will always beat a lower-priority skill when both match the same event.

Backpressure (budget enforcement). The 18KB byte budget is a backpressure mechanism. It prevents the system from overwhelming the agent's context window, just as a message queue applies backpressure to prevent a consumer from being overwhelmed. When the budget is exhausted, lower-priority skills are dropped gracefully — they can still be injected on future events if budget is available.

Dead letter handling (summaries). When a skill is dropped due to budget constraints, its summary field (a one-line fallback) can be included instead. The agent gets a pointer to the knowledge even if it cannot receive the full content. This is the dead letter queue of knowledge delivery — nothing is silently lost.

Building Your Own Push-Based Knowledge System

If you are a platform team, a DevRel team, or an open-source maintainer thinking about building a plugin, the event-driven model gives you a blueprint:

  1. Define your event channels. What observable actions does the agent take that signal relevance? File reads and writes are universal. Bash commands are universal. Prompt text is universal. Start there.

  2. Define your subscription format. YAML frontmatter with glob patterns, regex patterns, and prompt signals is a proven format. It is human-readable, machine-parseable, and testable.

  3. Implement priority and budget. Without these, you have a broadcast system that floods the agent with everything. Priority and budget turn it into a curated feed.

  4. Add dedup. Session-scoped deduplication ensures knowledge is delivered once and only once. Atomic file claims are the simplest reliable mechanism.

  5. Add validation. PostToolUse validation closes the feedback loop. Push systems without feedback are fire-and-forget. Push systems with feedback are self-correcting.

  6. Test the event flow, not just the content. The Vercel plugin has 32 test files covering hook integration, pattern matching, snapshot baselines, and fuzzing. The delivery mechanism is as important as what is delivered.

The gap between a SKILL.md and a plugin is the gap between a document and a system. Documents inform. Systems react. In a world where AI agents are writing production code at the speed of thought, reaction time is everything. By the time an agent thinks to look something up, it has already written three files with the wrong patterns. A push-based system prevents those files from ever being written wrong in the first place.


The best documentation you never had to read is the documentation that read your code and showed up at the right moment. That is what event-driven knowledge delivery means for AI agents — and it is why the future of developer tooling is not better docs, but better delivery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment