When There's Nothing to Embed

Everyone knows how RAG works. Embed documents, embed queries, find nearest neighbors. The mechanism is commodity infrastructure.

The interesting question is: what do you embed?

For most RAG applications, the answer is obvious. You have documents—PDFs, web pages, knowledge bases—you extract the text, fragment it into chunks, and embed the chunks. The content exists. You’re just making it searchable.

For API discovery, the embedding text doesn’t exist.

The documentation problem

API documentation describes implementation, not intent. An endpoint called POST /v1/customers with a description “Creates a new customer resource” tells you what the operation does mechanically. It doesn’t tell you when someone would want to use it.

When an agent is trying to “set up billing for a new user,” it’s not thinking in terms of POST /v1/customers. It’s thinking about the goal. The connection between goal and operation is exactly what retrieval needs to establish—and it’s exactly what the documentation doesn’t provide.

We tried embedding operation descriptions directly. The results were poor. “Creates a customer” matches queries about customers, but it doesn’t match “sign up a new subscriber” or “onboard a client” or “add someone to my account.” The semantic gap is too wide.

The schema problem

Maybe we could embed the schemas instead. The parameters and response types contain information about what an operation does.

Schemas are even worse. They’re structural, not semantic. A name: string field doesn’t tell you anything about intent. The fact that an operation accepts email and payment_method parameters is closer to useful—it suggests account creation—but the signal is weak and the noise is high.

Worse, schemas vary wildly across APIs. Different naming conventions, different structural patterns, different levels of detail. Any embedding approach has to handle this variation, and the variation swamps the signal.

The hybrid problem

We tried combining descriptions and schemas. Concatenate them, embed the combination, hope the model finds the relevant parts.

This helped marginally. But we were still embedding what already existed. The fundamental problem remained: the content that would make discovery work—natural language descriptions of when and why you’d use each operation—doesn’t exist in the specs.

The insight

If the content doesn’t exist, create it.

Not by hand—that doesn’t scale. But with AI. For each operation, we can generate synthetic intent phrases. Natural language descriptions of goals that operation might accomplish. Multiple phrasings. Different perspectives. The content that would make retrieval work if it existed.

This inverts the RAG assumption. Instead of embedding existing documents, we’re generating documents specifically designed to be embedded. The content is synthetic but purposeful—optimized for the retrieval task rather than inherited from documentation that served a different purpose.

Why generation works

There’s something recursive about this. We’re using AI to generate content that helps AI find things. The model that generates intent phrases is predicting how another model will search.

This seems like it shouldn’t work. But it does. Language models are good at generating plausible ways to describe goals. They’re good at variety—different use cases, different angles, different levels of specificity. The generated intent phrases cover semantic space that the original documentation never touched.

And because we control the generation, we can tune it. More phrases for complex operations. Different styles for different API domains. Explicit coverage of common goal patterns.

What this means

The retrieval problem for tool discovery isn’t a retrieval problem. It’s a content generation problem.

Once you have good content to embed, retrieval is straightforward. Vector search works. Nearest neighbors finds relevant operations. The mechanism is commodity infrastructure—but only after you’ve created content worth searching.

We spent months trying to make existing documentation work for retrieval. The breakthrough was realizing that we needed to generate the content that should have existed but didn’t.

Dead ends aren’t wasted work. They’re how you find the constraints that matter. The constraint here was: you can’t retrieve what doesn’t exist. The solution was: create it.

MCP Clients

Agent SDKs

Integrations

Technology

Learn

Company