Back to the blog

Keywords All the Way Down

JSON Schema processors typically implement keywords as cases in a switch statement. We made each keyword a first-class citizen. Here's why that matters.

· 4 min read

JSON Schema defines keywords. type constrains the value type. properties defines object structure. items describes array elements. $ref references other schemas.

Most implementations handle keywords procedurally. A validation function switches on keyword, handling each case. A traversal function does the same. Type generation, tree-shaking, transformation—each system reimplements keyword handling.

This creates fragmentation. Each concern has its own keyword dispatch. Add a new keyword and you modify five different places. Dialect differences require conditional logic scattered throughout.

We took a different approach.

Keywords as objects

Each keyword is an object that knows how to handle itself across all concerns:

A keyword is a complete unit of behavior. Add a keyword, and it automatically participates in validation, tree-shaking—everything that processes schemas.

Why this matters for dialects

JSON Schema dialects differ in which keywords exist and how they behave. Draft 04 has definitions. Draft 2020-12 renamed it to $defs. OpenAPI 3.0 adds nullable with semantics that don’t exist in pure JSON Schema.

With keyword objects, dialect support becomes configuration. Each dialect specifies its vocabulary—which keywords are active and their behaviors. Same schema processor, different keyword sets.

Dialect configuration enables nullable for OpenAPI 3.0, disables it for pure JSON Schema. No conditionals scattered through the codebase.

When a keyword breaks the model

JSON Schema’s conceptual model treats schemas as intersections of independent constraints. Each keyword acts independently—type restricts the value type, minLength restricts length, pattern restricts format. Valid data satisfies all constraints simultaneously.

OpenAPI 3.0’s nullable breaks this model. It isn’t a constraint. It’s a modifier that changes what the entire schema means.

Consider { type: "string", minLength: 1, nullable: true }. This doesn’t mean “a string with minLength 1 that is also nullable.” It means “(a string with minLength 1) OR null.” The nullable keyword logically hoists the entire schema into a union with null.

The nullable keyword effectively hoists all the other keywords into one branch of a union, with { type: "null" } as the other branch. Our example becomes { type: "string", minLength: 1 } | { type: "null" }. The nullable keyword disappears, replaced by union structure.

For simple schemas you can represent this with type: ["string", "null"]. But composition makes the transformation non-trivial:

{
"allOf": [
{ "$ref": "#/components/schemas/BaseUser" },
{ "$ref": "#/components/schemas/AdminPermissions" }
],
"nullable": true
}

This means “(BaseUser AND AdminPermissions) OR null.” Unions distribute over intersections. And nullable can be included by reference, compounding the complexity. Eliminating nullable for systems that don’t support it requires complex transformation—as does the reverse, converting type: [..., "null"] to nullable for systems that only understand the OpenAPI 3.0 syntax.

This is where the keyword model hits its limits. Keywords work because they’re local and independent. nullable is neither. Handling it requires a separate transformation pass that rewrites schema structure—work that can’t be localized to a single keyword. Some dialect differences are irreducibly complex.

Dependency ordering

Keywords have interdependencies. properties only makes sense when type is object. items only applies to arrays. Validation must evaluate keywords in the right order.

We model dependencies explicitly. Each keyword declares what it depends on. A topological sort produces correct evaluation order. Virtual keywords handle cases where explicit dependencies would create cycles.

This isn’t just implementation convenience. It makes the keyword system composable. Custom keywords can declare their dependencies and slot into the evaluation order correctly.

Tree-shaking as traversal

When generating types for a single API operation, we don’t want the entire spec—just the schemas that operation uses. This is tree-shaking: follow references, keep what’s reachable, discard the rest.

With keyword objects, tree-shaking is just traversal with a different visitor. Each keyword knows which of its values are subschemas to follow. properties knows its values are schemas. items knows its value is a schema. $ref knows to follow the reference. The traverser visits reachable schemas, guided by what keywords declare about themselves. The result is a minimal specification.

Add a new keyword with schema references, define its traversal, and tree-shaking automatically handles it.

The inversion

It’s the typeclass pattern, applied to schema processing. Each keyword implements a set of behaviors—parsing, validation, traversal, transformation. The schema processor doesn’t know what keywords exist. It just asks each keyword to do its job.

Validation is actually the least important behavior. Static analysis and transformation are the real drivers. Tree-shaking schemas, normalizing across dialects, generating annotations for downstream tools—these require understanding schema structure, not just checking data. Validation comes along for the ride.

JSON Schema’s spec defines dynamic vocabulary composition—dialects that declare which vocabularies they use, processors that respect those declarations. We implement this fully. Custom annotation keywords propagate through to output, making the schema engine useful for static analysis purposes the spec authors anticipated but few implementations support.

Add a new keyword, and it participates in everything. Add a new dialect, and it composes existing keywords. The core doesn’t grow with each addition. Combinatorial complexity avoided.