Changelog

Page 3 of 3

OpenAPI Compiler Infrastructure

With schema infrastructure solid, we moved up a layer to OpenAPI. Not parsing in the trivial sense—full compiler infrastructure for semantic analysis, traversal, transformation, and tree-shaking of API specifications.

The Typeclass Pattern, Again

December’s schema compiler used a keyword-centric architecture: each keyword knows how to parse, validate, traverse, and transform itself. OpenAPI has a different structure—a fixed object hierarchy rather than extensible keywords—but the same pattern applies.

Each OpenAPI object type implements a common behavioral interface. Operations know how to handle themselves. Parameters know how to handle themselves. Responses, security schemes, media types—all of them. Add a new capability like tree-shaking or type generation, and every object type participates automatically. No central dispatch to modify. No switch statements growing with each new concern.

Two Layers, Clean Delegation

OpenAPI embeds JSON Schema but isn’t JSON Schema. An operation contains parameters. Parameters contain schemas. When processing hits a schema, the OpenAPI layer delegates to the schema compiler’s keyword-based machinery, then resumes OpenAPI-aware processing afterward.

This boundary matters. A properties key inside a schema means subschemas to traverse. The same key in example data means nothing—just data to preserve. Format boundaries mark where interpretation rules change. The compiler switches semantic context at these boundaries automatically.

Tree-Shaking Across Layers

Given an operation, extract only what it needs. Follow parameter references. Follow schema references within those parameters. Follow security scheme references. The tree shaker doesn’t care which layer owns which reference—it follows them all and organizes output by OpenAPI component type.

An 8MB specification becomes ~5KB per extracted operation. Only the schemas, parameters, responses, and security schemes that operation actually uses. This is what enables serving operation-specific specs to AI agents without overwhelming context windows.

Version Abstraction

The compiler handles OpenAPI 3.0 and 3.1 uniformly. Version differences—schema dialect, nullable semantics, structural variations—normalize during parsing. Same infrastructure, same capabilities, regardless of which version the input uses.

Schema Validation

With parsing and reference resolution complete, we built out the validation layer—including the tricky Draft 2020-12 features that most implementations skip.

Schema Tree-Shaking

Unused definitions are automatically eliminated when extracting subschemas. An OpenAPI spec with hundreds of schemas produces operation-specific extracts containing only the types that operation actually uses. The tree-shaker follows $ref, $dynamicRef, allOf, anyOf, oneOf, items, properties, and all other schema-containing keywords.

Annotation Tracking

Complete JSON Schema validation with detailed error reporting. Annotation tracking collects all keyword annotations for determining which properties and items were evaluated—required for unevaluatedProperties and unevaluatedItems semantics in Draft 2020-12.

Lenient and Strict Modes

Lenient mode (default) skips invalid keywords instead of throwing. Strict mode throws on unknown or invalid keywords. Both modes include frame-based error context for debugging.

Schema Compiler

JSON Schema has evolved through five major dialects over a decade, each with different keywords, reference semantics, and edge cases. OpenAPI adds its own variations. We built a schema compiler that handles all of them.

Multi-Dialect Parsing

The compiler handles five schema dialects transparently: JSON Schema Draft 2020-12, Draft 07, Draft 05, OpenAPI 3.1, and OpenAPI 3.0. Each has its own keyword semantics and reference resolution rules. Schemas written in any dialect work correctly without manual conversion.

Vocabulary Modularity

Draft 2020-12 organizes keywords into vocabularies: Core, Applicator, Validation, Unevaluated, Format, Content, and Metadata. Dialects compose vocabularies rather than duplicating keywords. Draft 05 and Draft 07 share the same Core and Validation keywords because the semantics haven’t changed—only the combinations differ.

OpenAPI 3.0 reuses Draft 05’s vocabularies but adds nullable with distinct semantics (a modifier on type, not an additional type value). OpenAPI 3.1 shifts to Draft 2020-12’s vocabularies. The same $ref implementation works across all five dialects because vocabularies are the unit of reuse.

Keyword-Centric Architecture

Each keyword—type, properties, allOf, etc.—is a first-class behavior unit controlling its own parsing, validation, traversal, and transformation. Traditional implementations split these concerns across multiple systems. Adding a keyword means modifying each system. This architecture inverts that: add a custom keyword and it automatically participates in validation and tree-shaking without touching core logic.

Keywords declare dependencies on other keywords, enabling correct evaluation order through topological sort. Virtual keywords create logical barriers without requiring physical presence.

$ref and $dynamicRef

JSON References resolve automatically during parsing, handling both local and remote references. Draft 2020-12’s $dynamicRef and $dynamicAnchor enable truly recursive schema definitions that weren’t possible with static references.

The recursive schema problem: with static $ref, you can’t write a schema for “a tree where nodes can be any type including tree.” Dynamic references resolve at runtime against the outermost dynamic anchor in the current scope, enabling polymorphic recursion where constraints propagate through the entire structure.

Format Validators

Complete suite: date-time, date, time, duration; email and idn-email; hostname and idn-hostname; ipv4 and ipv6; uri, uri-reference, iri, iri-reference; uri-template; json-pointer and relative-json-pointer; regex. Three modes: skip, validate known, or strict.

OpenAPI 3.0 Nullable

OpenAPI 3.0’s nullable: true has fundamentally different semantics from JSON Schema constraint intersections. The type keyword must be a string, not an array. Validation passes for null when both type is defined and nullable is true. Normalization happens once at parse time.

Context Trees

Traditional compilers parse source into an AST, then traverse the AST. But when your source is already structured data—JSON or YAML—building a separate AST just amplifies memory. We developed context trees: AST-level functionality without building an AST.

JSON as Intermediate Representation

Metadata attaches to JSON values via WeakMaps rather than wrapper objects. Traversal, transformation, and tree-shaking work on the original JSON directly. Shadow context provides parent tracking, reference resolution, and semantic analysis without wrapper objects.

This is what makes toolcog-schema and toolcog-openapi compilers in the true sense—semantic analysis, optimization passes, tree-shaking—but operating directly on the source format rather than a separate IR.

Edge-Efficient Processing

Traditional AST approaches amplify memory several fold. Edge runtimes with constrained memory can’t process large specs this way. Context trees keep overhead proportional to traversal depth, not document size—enabling multi-megabyte API specifications to be processed where they couldn’t be before.

Parent and Ancestry Tracking

Parent-child relationships tracked through WeakMap associations rather than adding properties to JSON nodes. Preserves the original document structure while enabling upward traversal—essential for resolving JSON References where you need to know where you came from.

Reference Resolution

Multiple resolution strategies: JSON Pointer fragments for local navigation, named anchors for direct access, URI resolution for cross-document references. Async support for external resources. Batched resolution aggregates errors instead of failing on first.

Tree-Shaking

Extract minimal, self-contained subgraphs from larger documents. Reference traversal discovers all reachable definitions, relocates them to configurable output structure, rewrites all references to point to new locations. Shared targets appear once with all references pointing to the same object. This enables extracting a single API operation with only its schema dependencies from a massive OpenAPI spec.

Format Boundaries

Resources mark where one format embeds another—JSON Schemas within OpenAPI specs. The traverser switches semantic rules at these boundaries, processing the same document with different interpretation rules in different subtrees.

Cycle Detection

Automatic detection of circular references in the traversal frame stack. Prevents infinite recursion when following references while preserving cycle structure in output when needed.

URI Infrastructure

The foundation starts with URLs. Every API call constructs a URL from user input, and that’s where injection attacks live. We built RFC-compliant URI infrastructure that makes injection structurally impossible.

RFC 3986 URI Parsing

Complete implementation of URI parsing with RFC 3987 IRI support for internationalized identifiers. The parser handles the full grammar: scheme, authority, userinfo, host (IPv4, IPv6, IPvFuture), port, path, query, and fragment—with proper percent-encoding normalization. Strict validation throws on invalid input. No silent failures, no malformed-but-accepted URIs.

RFC 6570 URI Templates

All eight expansion operators across four expression levels: simple string, reserved, fragment, label, path segments, path parameters, form-style query, and form-style query continuation. Templates pre-compile for efficient repeated expansion with different variable values.

Hygienic URL Construction

URI Templates provide structural safety. User input cannot escape its designated position—a path parameter stays in the path, a query value stays quoted in the query string. The template grammar makes injection impossible by construction rather than by sanitization. No string concatenation with user input, no attack surface from URL manipulation.

Network Address Validation

Comprehensive validation for IPv6 (all compression forms, mixed IPv4-in-IPv6, exact hextet count), IPv4 (0-255 range, no leading zeros), and port numbers (0-65535 with overflow detection). Ambiguous or malformed addresses are rejected.

Reference Resolution

RFC 3986 §5.3 reference resolution with proper path merging, dot segment removal, and fragment handling. Correctly collapses ../ sequences and handles all relative reference types—prerequisite for resolving JSON References across documents.