AI agents need access to APIs. Lots of them. More than you can predict, more than you can preload, more than will fit in any context window. The only viable approach is retrieval: AI describes what it’s trying to accomplish, and the system finds the right operations at runtime.
This creates a fundamental constraint. If tools are code—functions that get loaded and executed—then retrieval means dynamically loading and executing code from a database. That’s a security nightmare. Every tool becomes an attack surface. Auditing becomes impossible. Trust evaporates.
But if tools are data—specifications that describe what to call and how—then retrieval is just fetching documents. A trusted interpreter executes them. The attack surface is the interpreter, not the infinite variety of retrieved tools. Data can be analyzed, validated, transformed. Code can’t—at least not universally.
This is why Toolcog treats tools as data.
What code-based tools look like
The traditional approach: you have an OpenAPI spec, you run a code generator, you get an SDK. The SDK wraps HTTP calls in language-native functions. You ship the SDK with your application.
This works fine when you’re building a single integration. But it doesn’t scale to AI:
You can’t retrieve code safely. Loading and executing code from a database means trusting that code. For AI tool use, where the whole point is discovering operations you didn’t anticipate, that trust model breaks down.
Context windows can’t hold everything. Generated SDKs include every operation, every type, every edge case. For an API with 500 operations, you get 500 functions. Multiply by every API the agent might need, and you’ve blown past any context limit.
Code is opaque. You can’t easily analyze a function to extract just the parts you need, or transform its interface to work better for a language model. Code is a black box.
Drift never ends. The moment you generate code, it starts drifting from the spec. The API updates, your generated code doesn’t. You regenerate, redeploy, resynchronize. Forever.
What data-based tools look like
OpenAPI specifications already contain everything needed to call an API correctly: endpoints, parameters, types, authentication, encoding rules. This is all data. You don’t need to compile it into code. You can interpret it at runtime.
When you upload an OpenAPI spec to Toolcog, we don’t generate code. We index the spec and store it as data. When AI calls an operation:
- Tree-shake the spec: Extract just the operation definition and its schema dependencies. An 8MB spec becomes 5KB.
- Transform for the model: Generate TypeScript types the model can understand—not raw JSON Schema, but clean types that look like what a human would write.
- Validate parameters: Check AI’s arguments against the schema.
- Build the request: Expand URI templates, encode parameters, serialize the body.
- Apply credentials: Inject authentication per the security scheme.
- Execute and decode: Send the request, parse the response.
Every step is driven by the spec. The spec is the source of truth.
What this makes possible
Safe retrieval. Tools are documents, not executable code. The interpreter is the trust boundary—a fixed, auditable system that handles any spec the same way.
Context efficiency. Tree-shaking extracts exactly what’s needed. An API with a thousand operations doesn’t mean loading a thousand operations. AI gets the minimal interface for the specific operation it’s calling.
Interface transformation. Because specs are data, we can analyze and transform them. Some schemas don’t work well for LLM generation—file uploads, multipart encoding, complex nested structures. We transform these into shapes the model can produce, then handle the complexity at execution time.
Type generation. Models work better with TypeScript than JSON Schema. We generate optimized types—simplifying unions, eliminating redundancy, inlining references to the right depth. The model sees clean interfaces, not schema machinery.
Uniform access. Data is uniform. Every operation, regardless of which API it comes from, can be discovered, learned, and called the same way. The same three meta-tools work for Stripe, GitHub, your internal APIs, anything with an OpenAPI spec.
The execution engine
Making this work required building a complete execution engine:
JSON Schema infrastructure: Full implementation of Draft 2020-12, Draft 07, and Draft 04, plus OpenAPI 3.0 and 3.1 extensions. Validation, transformation, tree-shaking, type generation. Every keyword, every edge case.
URI template expansion: RFC 6570 implementation for URL construction. Path parameters, query parameters, matrix parameters—all handled correctly.
Content encoding: JSON, form-urlencoded, multipart/form-data, raw binary. Each content type encoded per its specification.
Parameter styles: OpenAPI defines seven encoding styles. We implement all of them.
This is infrastructure that works with any spec. Not code that needs regeneration for each API.
The uniformity of data
The deeper insight is about uniformity. Data can be processed uniformly—the same parser, the same transformer, the same executor works for any spec. You don’t need special handling for each API.
Code isn’t like that. Every SDK is its own snowflake. Every integration is a special case. The combinatorial explosion is what makes AI tool use at scale impossible with code-based approaches.
By treating tools as data, every operation becomes uniform:
- Discovered the same way (semantic search over specs)
- Learned the same way (type generation on demand)
- Called the same way (spec interpretation at runtime)
- Secured the same way (structural properties from the interpreter)
When tools are data, the system scales. That’s the bet we made.
