Spec-Driven Execution

This is Part 2 of a two-part series. Part 1: Interface Synthesis covers the problem of bridging semantic and protocol space.

Interface synthesis gives LLMs something they can generate. But generation is only half the problem. We still need to transform what LLMs produce into valid HTTP requests—across every combination of parameter style, content encoding, and serialization rule that OpenAPI defines.

The naive approach is special cases: if form style, do this; if matrix style, do that; if multipart with nested JSON, do something else. This explodes. The combinations multiply faster than you can write handlers.

We needed a different approach. The goal: one execution path that handles any operation, driven entirely by the spec.

RFC 6570 as foundation

We built our RFC 6570 implementation early. URI templates are foundational: they handle URL construction, parameter expansion, encoding. We needed them solid before anything else.

OpenAPI’s specification explicitly connects parameter styles to RFC 6570. Matrix style maps to the semicolon operator. Label style maps to the dot operator. Form style maps to the question-mark operator. The spec tells you this directly.

But the spec’s approach is awkward. It suggests dynamically constructing RFC 6570 template strings at runtime, then parsing and expanding them. Build a string like {?foo,bar} or {;foo*}, parse it into a template, expand it with values. This works, but it’s indirect—you’re generating syntax to be parsed rather than working with the underlying structure.

We took a different path. Our RFC 6570 implementation exposes its internal representation: operators, separators, named vs. unnamed expansion, explode modes. Instead of building template strings, we construct the template AST directly. Each OpenAPI parameter style becomes a configuration:

simple: no prefix, comma separator, no names
form: ? prefix, & separator, named with =
matrix: ; prefix, ; separator, named without = for empty values
label: . prefix, . separator, no names

Same expander, different configurations. No string building, no parsing. Direct construction.

Extending beyond the spec

The real leverage came from using RFC 6570 mechanics for more than OpenAPI explicitly attributes to it.

Take deepObject style. OpenAPI specifies that filter[status]=active&filter[type]=user serialization, but doesn’t connect it to RFC 6570—it’s presented as a separate mechanism. The naive implementation adds another code path: if deepObject, do this special thing.

We integrated it into the expander instead. Our RFC 6570 implementation gained a flatten option: when expanding an object with flatten enabled, recursively descend into nested properties, building bracket-notation paths. The expansion logic handles the recursion. deepObject becomes one more configuration of the same machinery, not a parallel system.

This matters for combinatorics. Every separate mechanism multiplies with every other. Style × explode × location × special-cases compounds quickly. But when deepObject is just a flag on the expander, it doesn’t multiply—it’s absorbed into the existing parameter space.

The same pattern applies to spaceDelimited and pipeDelimited. These aren’t RFC 6570 operators, but they follow the same structure: different separator, same expansion logic. Configure the separator, done.

By extending RFC 6570’s reach, we collapsed what looked like a proliferation of special cases into variations of a single mechanism.

The hygiene benefit

Fewer encoding pathways has a security implication: fewer places for injection vulnerabilities to hide. When all parameter serialization flows through the same expander, you audit one code path. When deepObject is a flag rather than a separate system, the security properties of the expander apply to deepObject automatically.

But collapsing pathways forced us into decisions the spec doesn’t address. What does allowReserved mean when combined with deepObject? The spec defines each in isolation. It doesn’t say what happens when they interact.

We chose conservative interpretations. When the spec was silent, we picked the option that preserved encoding safety—percent-encode rather than pass through, reject ambiguous inputs rather than guess. Only when real-world specs from our test corpus required looser behavior did we relax constraints, and only after verifying the relaxation couldn’t break hygiene.

This was painstaking. Every edge case required analysis: what does the spec say, what do real APIs expect, what could an attacker exploit? The balance between compatibility and safety isn’t obvious. We erred toward safety, loosening only with evidence and confidence.

The pluggable encoding system

Parameter encoding handles the URL. Body encoding handles everything else.

The pattern we established for parameters—configurable primitives instead of special cases—extends to content encoding. But bodies have more variation: JSON, form-urlencoded, multipart, binary, text, and whatever custom formats APIs might require.

We built a pluggable system. Content encoders are selected by media type. Each encoder handles its format: JSON encoders serialize objects, form encoders handle URL encoding, multipart encoders manage boundaries and parts, binary encoders handle raw bytes.

The selection is priority-ordered. When an operation specifies multiple content types (common for content negotiation), we try encoders in order until one matches. APIs can prefer JSON but accept form-encoded; we’ll use JSON if available, fall back to form if not.

New formats don’t require core changes. Add an encoder, register it for a media type, done. The system extends without modification.

Composition all the way down

The real test of an abstraction is whether it composes. Ours does.

Multipart messages contain parts. Each part has its own content type. A part might be JSON, or text, or binary—or another multipart message. The multipart encoder doesn’t special-case these. It asks the encoding system for an encoder matching each part’s content type, then delegates.

This means nested structures work automatically. Multipart containing JSON containing base64-encoded binary? The multipart encoder delegates to the JSON encoder, which handles the object structure, and the base64 encoding happens through schema-level contentEncoding. Each layer handles its concern.

The composability comes from the dual-purpose design we described in interface synthesis. The same encoder that generated the schema for a part also encodes values for that part. Delegation preserves the schema-encoding correspondence through arbitrary nesting.

Content negotiation at both ends

APIs often support multiple request formats. They almost always support multiple response formats. Content negotiation—telling the server what you can send and what you can accept—happens at both ends.

On the request side, we examine the operation’s content types and select the best one we can encode. If the operation accepts both application/json and application/x-www-form-urlencoded, and the LLM generated a nested object, we choose JSON (form encoding struggles with nesting). The selection is automatic, based on what the data requires and what encoders are available.

On the response side, we set Accept headers based on what we can decode. When the response arrives, we select a decoder based on the Content-Type header. Same pattern as encoding: pluggable decoders, media-type selection, priority ordering.

The symmetry isn’t accidental. Request encoding and response decoding are mirror operations. The same abstractions that handle one handle the other.

No special cases

After months of work, we arrived at something clean. Any OpenAPI operation executes through the same path:

Template expansion: Server URL, path, query string, headers, cookies—all use the configured URI template expander
Body encoding: Selected by content type, delegates through the encoder hierarchy
Credential application: Security schemes configure where credentials go (header, query, cookie) without the encoder knowing about authentication
Response decoding: Selected by content type, symmetric with encoding

The spec drives everything. Parameter styles configure the template expander. Content types select encoders. Security schemes direct credential placement. No hardcoded knowledge of specific APIs or formats.

This is what spec-driven execution means. The OpenAPI specification isn’t documentation—it’s the program. We interpret it at runtime, deriving all behavior from its declarations. Change the spec, behavior changes. No regeneration, no deployment, no code.

The result

We can execute any OpenAPI operation. Not “most operations” or “common patterns”—any operation. Matrix-style path parameters with exploded arrays. Multipart bodies with per-property content types. Cookie authentication with form-encoded bodies. Combinations we’ve never seen work because the primitives compose.

This took months. Understanding OpenAPI’s encoding model, finding the abstractions that collapse it, building the machinery that interprets specs at runtime. The result looks simple—three meta-tools, any API—but the simplicity is earned.

The execution layer is where interface synthesis meets reality. Semantic data goes in, protocol-compliant requests come out. Every combination, every format, every style. That’s the bridge.

MCP Clients

Agent SDKs

Integrations

Technology

Learn

Company