You know how RAG works. Embed some documents, embed a query, find the nearest neighbors. The interesting question isn’t the retrieval mechanism—it’s what you embed.
For API operations, the answer is: nothing useful exists.
What would you embed for an operation like POST /repos/{owner}/{repo}/issues?
The spec itself? JSON Schema and path templates don’t carry semantic meaning in embedding space. Structural descriptions embed as noise.
The documentation? “Creates an issue.” Thanks. That’ll really help distinguish it from the other hundred issue-creation operations across Jira, Linear, Asana, GitLab, Bitbucket, Azure DevOps…
The operation name? repos/issues/create tells you nothing about tracking bugs, requesting features, discussing changes, or any of the reasons you’d actually create an issue.
The semantic meaning of an operation—what it’s for, why you’d need it, in what contexts—doesn’t exist in any artifact. Not in the spec. Not in the documentation. Nowhere.
So we generate it.
The intent generation model sees everything: the operation’s synthesized TypeScript interface, full documentation, parameter descriptions, response types, related operations for disambiguation. Complete context for understanding what this operation does.
But context alone isn’t enough. The model’s latent knowledge—absorbed from code, tutorials, discussions, Stack Overflow, blog posts—lets it understand what that context means. Why developers create issues. In what workflows. For what purposes. The documentation says “creates an issue”; the model understands issue tracking.
Intent generation combines explicit context with latent understanding to synthesize searchable phrases:
"Create a new issue in a GitHub repository""Open a GitHub issue to track a bug""File a feature request on GitHub"These phrases don’t exist in any documentation. They’re synthesized by a model that comprehends the operation—from the context we provide, enhanced by latent understanding—and imagines how an agent would search for it.
This is distillation. The same principle as distilling a large model into a smaller one, but targeting embedding space instead of model weights. A frontier model comprehends the operation once, and that understanding gets compressed into phrases optimized for fast similarity matching.
This works for any API—public or private. The model doesn’t need prior knowledge of your internal services. It sees your types, your documentation, your parameter descriptions. Its latent understanding of software development helps it synthesize meaningful intents from that context, even for operations it’s never encountered.
A human could write intent phrases. But humans would write how humans search, not how AI agents search.
Users don’t query the system directly. An AI agent does. The user says “handle this customer complaint.” The agent decomposes that into capability searches: create a support ticket, send an SMS, maybe issue a refund. The agent is the one expressing intent.
So we use AI to generate intents for AI. The generating model knows how searching agents think because it is one. It can imagine the capability gaps an agent might have, the tasks that would lead to needing this operation, the ways an agent would express that need.
The intents aren’t descriptions of operations. They’re predictions of how agents will search—made by a model that understands both the operation and the searcher.
Frontier models understand deeply but cost accordingly. You can’t run one for every search query.
So you run it once per operation at index time. The frontier model comprehends the operation, generates intent phrases, and that understanding gets frozen into embeddings. At query time, a tiny embedding model handles similarity matching in milliseconds.
Expensive intelligence, amortized. The frontier model’s comprehension is distilled into a form that cheap infrastructure can search. Every operation understood once by a capable model, searchable forever by a fast one.
The quality of discovery is bounded by the quality of intents, not by the retrieval mechanism.
Good intents capture distinct reasons an agent might need an operation:
"Charge a customer in Stripe""Collect payment for an invoice in Stripe""Process a refund in Stripe after dispute"Customer-focused, invoice-focused, dispute-focused. Three different contexts that would lead to the same operation.
Bad intents are synonym collections:
"Create a payment in Stripe""Process a transaction in Stripe""Generate a charge in Stripe"These embed nearly identically—no additional semantic coverage. An agent searching for “collect payment for an outstanding invoice” won’t match “generate a charge.”
The generating model has to understand the operation deeply enough to decompose it into orthogonal meaning vectors. Different motivations, different contexts, different ways of conceptualizing the same capability.
When your catalog spans multiple services, collisions matter. “Create issue” matches GitHub, Jira, Linear, Asana, and more.
Intents must disambiguate:
"Create a new issue in a GitHub repository" ✓"Open a Jira ticket in a project" ✓"Create issue" ✗Service names appear naturally in the phrasing, not as metadata filters. The embedding itself carries the disambiguation. An agent searching for “file a bug in our GitHub repo” matches GitHub operations, not Jira—because the intents were generated with that specificity.
Generic intents capture how any agent might search. But your agent isn’t generic—it has a domain, a purpose, specific terminology.
When you create a custom catalog, you can provide a focus: a sentence or two describing what your agent does. The generating model tailors intents for that domain.
A DevOps-focused catalog might generate:
"Create a GitHub issue to track a CI pipeline failure""Open a GitHub issue for post-deployment verification"Same operation, different semantic coverage. A DevOps agent searching for “track a failed deployment” now matches—because the intents were generated with that agent’s vocabulary and workflows in mind.
Focus doesn’t replace the operation’s core identity. “Create a GitHub issue” still anchors the intents. But focus extends into domain-specific territory, increasing retrieval accuracy for terminology and usage patterns that stray from the generic mean.
Build a catalog with the APIs your agent needs. Describe what your agent does. Hit generate. Semantic search adapts to your domain.
Every operation goes through the distillation pipeline:
learn_api wouldThe frontier model sees the same synthesized interface that runtime agents see. Its understanding is grounded in what agents will actually work with, not raw specs or useless documentation.
When an agent searches, the query embeds and matches against indexed intents. Top results return with similarity scores.
This part is commodity infrastructure. Embed, compare, rank. The insight is on the generation side—creating something worth searching in the first place.
An agent searching for “simulate receiving funds into a Stripe treasury account for testing” finds PostTestHelpersTreasuryReceivedCredits—a deeply buried test helper—because a frontier model understood what that operation is for and generated intents that captured it.
The meaning was latent. Now it’s searchable.