Tools for AGI

When Anthropic launched Claude Cowork, the reaction was immediate: this feels like AGI.

The reaction was correct. When you give a frontier model the ability to act—not just think—something qualitatively different emerges. Cowork can organize your files, create spreadsheets with working formulas, produce polished presentations from scattered notes. Real work, actually getting done, autonomously.

But Cowork optimizes for a specific mode: coworking with AI on your personal computer. Desktop files, document creation, visual tasks. That’s valuable. It’s also a glimpse of something larger.

What happens when you extend that capability to every API?

The action gap

AGI is usually framed as a question of intelligence: can a system reason as well as humans across all domains? But intelligence alone isn’t the threshold that matters. What matters is whether a system can accomplish arbitrary goals in the world.

A physicist who can solve differential equations but can’t communicate, can’t operate a computer, can’t interact with anything is a theorem prover with no interface to reality. General intelligence requires the ability to act across the full range of situations humans navigate.

Current frontier models have crossed the reasoning threshold. They engage substantively with nearly any intellectual task—mathematics, medicine, law, code, strategy, creativity. What they can’t do, without help, is act.

Ask a model to explain how to configure Stripe recurring billing. It will give you a thorough, accurate explanation. Ask it to actually configure recurring billing. It can’t. There’s no path from its reasoning to Stripe’s API.

This gap—between understanding and doing—is the actual barrier. Tools bridge it.

APIs as the action layer

Consider what happens through APIs:

Money moves. Stripe, Plaid, banking APIs. Payments, transfers, reconciliation.
People communicate. Email, Slack, SMS, push notifications. Every message delivery.
Work gets tracked. Jira, Linear, Asana, GitHub Issues. Every task, every project.
Infrastructure operates. AWS, GCP, Azure, Cloudflare. Servers, databases, deployments.
Customers get managed. Salesforce, HubSpot, Intercom. Every relationship, every interaction.

This isn’t a partial list. It’s representative. The digital world runs on APIs. Almost every action a knowledge worker takes—sending an email, scheduling a meeting, creating a document, deploying code, charging a customer—ultimately becomes an API call.

If an AI can call any API, it can do anything a human can do in the digital realm.

Seeing is believing

In a recent demo, I gave Claude an absurd prompt:

“Create a product plan in Asana, manage development in GitHub, setup billing with Stripe, create a pitch deck in Google Slides, build a website with Webflow, manage deployments with Netlify and Cloudflare, configure Google Analytics, and run AdSense campaigns. Oh, and DJ my Spotify while we work.”

Ten services. Most require authentication. Some have thousands of operations. The prompt was designed to be impossible.

Claude’s response, after connecting to mcp.toolcog.com:

“This is remarkable. I have access to essentially everything you mentioned.”

And then it just did it. Asana project with 16 tasks across 4 phases. GitHub repository with milestones, labels, and 12 detailed issues. Ten-slide pitch deck in Google Slides. Ten services, dozens of operations, OAuth flows handled mid-conversation.

What makes this work isn’t just that these services are available—it’s that the full API of each service is available. Not a curated subset. The complete interface.

Another demo: “Audit the security of my Cloudflare account.” Claude methodically investigates every nook and cranny—DNS records, firewall rules, access policies, API tokens, zone settings—using vast swathes of the 2,118 operations in the Cloudflare API. Thorough in a way that’s only possible with access to everything.

The compound effect

Single API calls are useful. But the emergent capability comes from composition—chaining operations across services, correlating data, coordinating actions, adapting to results.

“Find customers who signed up last month but haven’t purchased, cross-reference with support tickets to see if any had issues, and draft personalized follow-up emails addressing their concerns.”

That’s Stripe + a support system + email, coordinated by reasoning about relationships between data. No single API provides this. The capability emerges from composition.

These aren’t hypotheticals. They’re workflows knowledge workers execute daily—manually, slowly, with context switching and copy-pasting between systems. An AI with API access executes them directly.

Why this differs from computer use

Cowork can navigate browser UIs through computer use—screenshots, coordinate estimation, simulated clicks. That works for truly visual tasks: legacy applications without APIs, workflows that require seeing what’s on screen.

But most knowledge work isn’t fundamentally visual. It’s a graphical wrapper around APIs. When AI navigates that UI, it’s translating twice: intent → pixels → API calls. When AI calls the API directly, it’s translating once: intent → API calls. Every additional translation adds latency, cost, and failure modes.

More fundamentally: computer use requires a desktop. That’s fine for personal productivity. But agents don’t have desktops. An autonomous agent coordinating work across services, operating on behalf of users who aren’t at their computers—there’s no screen to screenshot.

You can spin up virtual desktops in the cloud. But that’s higher latency, higher cost, and less reliable than direct API access. Screenshots get misinterpreted. Coordinates drift. UIs change. Once a model decides what to do, API access executes that intent reliably. Computer use adds another layer of uncertainty.

Governance at scale

The obvious concern: if AI can call any API, who controls what it does?

This is where architecture matters. Toolcog implements what we call a narrow waist—a single interface through which all agent-API interactions flow. Every search, every interface request, every API call passes through three meta-tools: find_api, learn_api, call_api.

Observability. Every action is logged. You see what capabilities agents are discovering, which operations they’re learning, what calls they’re making.

Access control. Catalogs define what operations agents can discover. An agent connected to a “support” catalog only sees support operations. The boundary is enforced at discovery.

Kill switches. Revoke a credential, the agent can’t call. Remove an operation from a catalog, agents can’t find it. Control is immediate.

Credential isolation. Zero-knowledge encryption means the platform cannot access your credentials. Keys derive from your session. Without that token, decryption cannot begin.

The threshold

Here’s the claim: a frontier model with universal API tools crosses a meaningful threshold toward AGI.

Not because it suddenly becomes more intelligent—the reasoning capability was already there. But because tools dissolve the barrier between reasoning and action. The model can understand what you want to accomplish, figure out how to accomplish it, and actually do it. Across any domain accessible through APIs. Which is most domains.

The remaining gaps are real. Physical action—APIs can’t move atoms directly, though they control robots that can. Novel interfaces—truly new systems without APIs require human mediation, though most systems acquire APIs. Judgment limits—models make mistakes, though so do humans.

These gaps matter. But they’re narrower than the intelligence gap that occupied AGI discussion for decades. The intelligence is here. Ubiquitous tools unlock the latent ability for action that models already possess.

Cowork showed what happens when models can act on a desktop. The same capability, extended to every API, crosses the same threshold for everything else.

The invitation

The bridge is built. Anyone can walk across it.

Add mcp.toolcog.com to an MCP client. Ask AI to do something you’d normally do yourself. Watch what happens.

Not as an experiment. As work. Real tasks you actually need done. See whether the gap between understanding and doing has closed enough to change how you operate.

That’s the only test that matters. Not whether it meets some philosopher’s definition of AGI. Whether it changes what you can accomplish.

Try it. Then decide what to call it.

Sources:

Introducing Cowork — Anthropic
First impressions of Claude Cowork — Simon Willison
Toolcog demo

MCP Clients

Agent SDKs

Integrations

Technology

Learn

Company