Back to the blog

Running a Compiler on a CDN

What happens when you run compiler infrastructure on a platform built for 50ms HTML rendering?

· 4 min read

Cloudflare Workers were designed for a specific workload: intercept an HTTP request, do minimal processing, return a response. The target is 50ms. The memory limit is 128MB. The runtime assumes you’re not doing anything too computationally expensive—or more acutely, memory-intensive.

We’re running compiler infrastructure on this platform.

The mismatch

Our OpenAPI compiler processes entire OpenAPI specifications. Tree-shaking extracts individual operations. Type generation synthesizes TypeScript interfaces. Reference resolution traverses dependency graphs. These are not 50ms operations.

When we first deployed, things worked—mostly. Small specs processed fine. Medium specs worked. Large specs failed intermittently.

The failures were strange. Sometimes an operation would succeed, sometimes fail, with identical inputs. Timeouts on requests that should have been fast. Memory errors that didn’t correlate with spec size.

We were running into the reality of shared infrastructure.

128MB isn’t 128MB

The memory limit sounds generous. 128MB is plenty for request processing.

Except you’re sharing an isolate with Cloudflare’s runtime, its caches, its buffers, and potentially other requests. The 128MB is a quota for your code, but the actual available memory depends on what else is happening in the isolate.

Process a large spec? Your memory usage spikes. But so does Cloudflare’s internal bookkeeping. The combination can exceed the limit even when your code alone wouldn’t.

We discovered this empirically. A workload that processed 2,000 operations per second in a local 128MB container failed 9 times out of 10 in a Worker. Same code, same memory quota, different environment.

Chaos Monkey from day zero

The intermittent failures were a gift in disguise. They forced us to confront scalability from the start.

In a traditional deployment, you’d optimize until things worked, then pray the load doesn’t spike. The comfortable environment hides the fragility. The system works until it doesn’t, and the failure mode is catastrophic.

Workers gave us Chaos Monkey for free. Every deployment had a fraction of requests failing. We couldn’t ship anything that wasn’t robust to partial failure.

This changed how we built things.

De-optimizing the compiler

Everything I knew about compiler efficiency was wrong for this environment.

Normal compiler optimization: cache parsed results to avoid re-parsing. Workers reality: caches consume memory that the GC won’t reclaim before you OOM. We ripped out caching.

Normal optimization: process related operations together to share context. Workers reality: processing in batches means batch-sized memory spikes. We process one operation at a time.

Normal optimization: build in-memory data structures for fast traversal. Workers reality: large data structures trigger memory limits. We rebuild context from the source on each access.

The result is comically inefficient by traditional metrics. We parse the same JSON multiple times. We traverse the same paths repeatedly. We reconstruct context that a sensible system would cache.

But it works. Memory usage stays bounded. Requests complete reliably. The inefficiency is the point.

Baby steps in the pipeline

We reconstituted each processing step into independently retryable stages. Each stage does minimal work and persists its output.

Parse the spec. Store parsed result. Tree-shake one operation. Store the tree-shaken spec. Generate types for that operation. Store the types. Index the operation. Store the index entries.

Any stage can fail. Any stage can be retried. Progress is preserved across failures. A request that times out picks up where it left off.

This is embarrassingly coarse-grained. A proper batch system would checkpoint at finer granularity. But coarse-grained works, and working beats elegant.

The tradeoff

We lost single-thread efficiency. A beefy server would process our entire spec catalog faster than we can, serially, in Workers.

But we gained massive parallelism. Cloudflare runs our code in 300+ locations. Each location handles requests independently. A large indexing job distributes across the network automatically. No coordination, no bottlenecks, no single point of failure.

The aggregate throughput is higher even though individual operations are slower. The reliability is higher because failures are isolated. The cost is lower because we’re renting compute, not running servers.

What this taught us

Platform constraints aren’t just constraints—they’re architectural forcing functions.

If Workers had given us a comfortable 8GB and unlimited CPU, we’d have built a comfortable system that scaled vertically until it didn’t. The constraints forced horizontal scalability from the start.

The memory limits forced stateless request handling. The timeout limits forced coarse-grained checkpointing. The shared environment forced failure tolerance.

We didn’t build a system and then adapt it for Workers. We built a system shaped by Workers’ constraints. The result is fundamentally different—and more robust—than what we’d have built in a traditional environment.

Sometimes the worst platform for the job teaches you the most about how to do the job right.