Back to the blog

Making Tools Discoverable

AI for AI: using models to predict how other models will search.

· 4 min read

Last month I wrote about why conventional RAG doesn’t work for tool discovery: the content you need to embed doesn’t exist. API documentation describes implementation, not intent. The semantic meaning of operations—why you’d use them, in what contexts—isn’t written down anywhere.

The solution is to generate it. Here’s how.

The embedding problem

Consider POST /repos/{owner}/{repo}/issues. What would you embed?

The spec itself? JSON Schema and path templates don’t carry semantic meaning in embedding space. Structural descriptions embed as noise.

The documentation? “Creates an issue.” That’ll really help distinguish it from the hundred other issue-creation operations across Jira, Linear, Asana, GitLab, Bitbucket, Azure DevOps.

The operation name? repos/issues/create tells you nothing about tracking bugs, requesting features, discussing changes, or any of the reasons you’d actually create an issue.

The semantic meaning of an operation—what it’s for, why you’d need it, in what contexts—doesn’t exist in any artifact. Not in the spec. Not in the documentation. Nowhere.

So we generate it.

Intent generation

A frontier model sees everything about an operation: the synthesized TypeScript interface, full documentation, parameter descriptions, response types, related operations for disambiguation. Complete context.

But context alone isn’t enough. The model’s latent knowledge—absorbed from code, tutorials, discussions, Stack Overflow, blog posts—lets it understand what that context means. Why developers create issues. In what workflows. For what purposes. The documentation says “creates an issue”; the model understands issue tracking.

Intent generation combines explicit context with latent understanding to synthesize searchable phrases:

"Create a new issue in a GitHub repository"
"Open a GitHub issue to track a bug"
"File a feature request on GitHub"

These phrases don’t exist in any documentation. They’re synthesized by a model that comprehends the operation and imagines how an agent would search for it.

Distillation

This is distillation. The same principle as distilling a large model into a smaller one, but targeting embedding space instead of model weights.

A frontier model comprehends the operation once. That understanding gets compressed into phrases optimized for fast similarity matching. Expensive intelligence, amortized.

It’s cost and latency prohibitive to run a frontier model for every search query. But you can run it once per operation at index time. The model’s comprehension freezes into embeddings. At query time, a comparatively small embedding model handles similarity in milliseconds.

Every operation understood once by a state of the art model, searchable forever by a fast one.

AI for AI

A human could write intent phrases. But humans would write how humans search, not how agents search.

Users don’t query the system directly. An AI agent does. The user says “handle this customer complaint.” The agent decomposes that into capability searches: create a support ticket, send an SMS, maybe issue a refund. The agent is the one expressing intent.

So we use AI to generate intents for AI. The generating model knows how searching agents think because it is one. It can imagine the capability gaps an agent might have, the tasks that would lead to needing this operation, the ways an agent would express that need.

The intents aren’t descriptions of operations. They’re predictions of how agents will search.

Intent quality

The quality of discovery is bounded by the quality of intents, not by the retrieval mechanism.

Good intents capture distinct reasons an agent might need an operation:

"Charge a customer in Stripe"
"Collect payment for an invoice in Stripe"
"Process a refund in Stripe after dispute"

Customer-focused, invoice-focused, dispute-focused. Three different contexts that would lead to the same operation.

Bad intents are synonym collections:

"Create a payment in Stripe"
"Process a transaction in Stripe"
"Generate a charge in Stripe"

These embed nearly identically—no additional semantic coverage. An agent searching for “collect payment for an outstanding invoice” won’t match “generate a charge.”

The generating model has to understand the operation deeply enough to decompose it into orthogonal meaning vectors. Different motivations, different contexts, different ways of conceptualizing the same capability.

Disambiguation

When your catalog spans services, collisions matter. “Create issue” matches GitHub, Jira, Linear, and more.

Intents must disambiguate:

"Create a new issue in a GitHub repository" ✓
"Open a Jira ticket in a project" ✓
"Create issue" ✗

Service names appear naturally in the phrasing, not as metadata filters. The embedding itself carries the disambiguation. An agent searching for “file a bug in our GitHub repo” matches GitHub operations, not Jira—because the intents were generated with that specificity.

Focus-aware generation

Generic intents capture how any agent might search. But your agent isn’t generic—it has a domain, a purpose, specific terminology.

When you create a custom catalog, you can describe what your agent does. The generating model tailors intents for that domain.

A DevOps-focused catalog might generate:

"Create a GitHub issue to track a CI pipeline failure"
"Open a GitHub issue for post-deployment verification"

Same operation, different semantic coverage. A DevOps agent searching for “track a failed deployment” now matches—because the intents were generated with that agent’s vocabulary in mind.

The insight

RAG assumes content exists. For documents, it does—you chunk and embed what’s there.

For API operations, the semantic meaning you need to search doesn’t exist anywhere. The spec is structural. The docs are terse. The names are opaque.

The solution is generation. A frontier model comprehends each operation once, synthesizing the meaning that should exist but doesn’t. That meaning gets distilled into embeddings that cheap infrastructure can search.

Semantic distillation: expensive comprehension at index time, fast retrieval forever after.