Scaling the Number of Tools

There’s a graph that haunts every AI tool-use system. On the x-axis: number of tools. On the y-axis: reliability. The line slopes down.

Add more tools, model accuracy drops. It’s been measured, replicated, discussed. The conventional wisdom is clear: keep your tool count low. Curate carefully. Less is more.

This is correct—for static tool registration. It’s completely wrong for dynamic discovery.

The fixed-function assumption

Every major AI system uses the same pattern: define tools upfront, register them at connection time, front-load descriptions into context. The model sees all tools simultaneously and chooses among them.

This works fine for ten tools. Twenty. Maybe fifty if you’re careful about descriptions.

It breaks at scale. Give a model 500 tools and it starts confusing them. Similar descriptions blur together. Rare tools get ignored. The selection mechanism wasn’t designed for this.

So everyone limits tool count. An agent gets a handful of carefully chosen capabilities. If you need more, build another agent. The architecture assumes tools are scarce, selection is hard, and less is better.

What happens at 100,000

Our system indexes over 100,000 operations across hundreds of APIs. Not registered upfront—discovered on demand.

An agent searching for “create a GitHub issue” doesn’t see 100,000 tools. It sees the few operations that matched its search. The selection already happened. The model just confirms.

The reliability curve doesn’t slope down because the model never faces the selection problem at scale. Dynamic discovery is pre-selection. By the time the model sees candidate tools, the hard work is done.

Three tools, any operation

The meta-tool pattern collapses the problem. An agent has three tools: find_api, learn_api, call_api. That’s it.

It searches for capabilities when it needs them. Learns interfaces just in time. Executes operations and discards the knowledge afterward.

From the model’s perspective, it always has three tools. From the user’s perspective, it can access any operation we’ve indexed. The tradeoff evaporates.

Access control shifts

When tools are static, access control is per-tool. You curate which tools each agent can use. Permissions live in the tool registration.

When discovery is dynamic, access control shifts to the index. An agent can only find operations that exist in its catalog scope. The catalog defines the boundary—everything inside is discoverable, nothing outside exists.

This is simpler. Instead of managing permissions per-operation, you manage them per-catalog. Create a “payments” catalog with Stripe and PayPal. Create a “support” catalog with Zendesk and Intercom. An agent connected to one catalog can’t discover operations from another.

The granularity is coarse but the model is tractable. Thousands of operations, managed through a handful of catalog boundaries.

The composition property

Catalogs compose. A “DevOps” catalog can source from GitHub, AWS, and Datadog catalogs. The operations merge into a unified search space.

This lets you build hierarchies. Team-specific catalogs that inherit from organization catalogs. Environment-specific scopes that filter production vs. staging. Custom overlays that add internal APIs to public service collections.

The composition happens at the catalog layer, invisible to the agent. It still has three tools. It still searches, learns, executes. The organizational structure exists in the infrastructure, not the prompt.

Breaking the malaise

Fixed-function tool use has created a kind of malaise. Agents are limited to small, carefully curated capability sets. Building an agent means choosing which dozen tools it gets. Extending it means rebuilding the tool registration.

Dynamic discovery dismantles this tradeoff. An agent can access any operation in its catalog scope without anyone explicitly configuring access. New APIs show up automatically. Old ones fade away. The capability set is fluid.

The agent doesn’t need to know what tools exist. It needs to know what it wants to accomplish. The system translates intent to capability at runtime.

The architecture that enables this

This isn’t magic. It requires:

Semantic indexing that matches intent to operations. When an agent searches “send a notification,” the index returns Twilio SMS, SendGrid email, Slack messages—ranked by semantic similarity.

Interface synthesis that generates usable types on demand. The agent sees a clean TypeScript interface, not a raw OpenAPI spec.

Spec-driven execution that handles the mechanical details. Parameter encoding, authentication, error handling—all invisible.

Each piece is necessary. Together they make tool count irrelevant.

What this means

The conventional wisdom about tool count comes from a particular architecture—static registration into context. That architecture has a fundamental scaling limit.

Dynamic discovery is a different architecture with different properties. The selection problem shifts from model to infrastructure. The scaling limit shifts from context to index.

We’ve indexed 100,000 operations. We could index a million. The agent would still have three tools, and the reliability curve would still be flat.

That’s what scaling looks like.

MCP Clients

Agent SDKs

Integrations

Technology

Learn

Company