Contents
The claims that lit the fire
On March 11, 2026, at the Ask 2026 conference, Perplexity CTO Denis Yarats announced that the company was internally moving away from Anthropic's MCP toward APIs and CLIs. The reasons he gave were two: (1) MCP tool definitions consume 40-50% of available context before the agent does any work, and (2) authentication flows create friction when connecting to multiple services. The clip spread on Threads and X within hours, and "MCP is dead" / "the standard has died" reactions piled up.
Around the same window, Merge CTO Gil Feig independently estimated that "tool metadata accounts for 40-50% of available context in typical deployments." Developer reports surfaced too: one engineer measured roughly 7,000 tokens already consumed before the first prompt with just four MCP servers connected; a colleague on a heavier configuration was burning ~66,000 tokens — nearly a third of Claude Sonnet's 200k context window — before saying anything.
The viral discourse compresses to three claims. We check each against measured data.
Claim ①: "Tool definitions eat 40-50%"
True for typical multi-server setups. Not true for single-server, curated configurations.
What is actually consuming tokens? Several independent measurements converge on similar numbers.
| MCP server / configuration | Tool count | Definition tokens | % of 200k window |
|---|---|---|---|
| GitHub official MCP | 94 | ~17,600 | ~8.8% |
| Atlassian official MCP (Jira + Confluence) | — | ~10,000 | ~5.0% |
| Multiple large servers connected | — | 30,000+ | ~15%+ |
| 4-server lightweight config (typical) | — | ~7,000 | ~3.5% |
| Heavy config (reported extreme) | — | ~66,000 | ~33% |
"40-50%" should be read as a range for a class of configurations, not an absolute property of MCP. Stack several large servers and you arrive there easily — GitHub alone is 94 tools and ~17.6k tokens, so combining it with Atlassian and Slack will cross 30k. Curate down to one or two servers — or use a server with a compact mode like KanseiLink's compact: true — and you sit in the low thousands.
"MCP eats 40-50%" is a "if you connect everything just in case" truth. It is not a fundamental property of the protocol; it is a property of the operating choice. The same server can consume an order of magnitude fewer tokens depending on what you select and how.
Claim ②: "MCP is dead"
One company's choice is not an industry retreat. The standard is being patched, not buried.
The "MCP is dead / standard has died" reaction extrapolates a single company's internal call into an industry trend. The actual situation:
- Adoption keeps spreading — Since Anthropic open-sourced MCP in November 2024, it has been adopted by OpenAI, Google DeepMind, Microsoft, Cursor, Atlassian, ServiceNow, Salesforce, Linear, GitHub, and a long tail of enterprise platforms. In April 2026 it was donated to the Linux-Foundation-hosted AAIF (AI Agent Interoperability Foundation), moving toward vendor-neutral standardization.
- The fix is being designed in the open — The official MCP repo is working on SEP-1576: Mitigating Token Bloat in MCP — Reducing Schema Redundancy and Optimizing Tool Selection, standardizing both schema-redundancy reduction and dynamic tool selection.
- Compression tools already exist — Atlassian ships
mcp-compressorwith reported 70-97% tool-description compression without changing how the agent calls tools. Anthropic's MCP guidance includes code mode, which has been measured to cut context overhead by up to 98.7%. - The defection is one company, one use case — Perplexity's choice reflects a specific product context (search) where API + CLI was a more economical path. That is a workload-level optimization, not a verdict on the protocol.
The accurate framing is not "the protocol is dying" but "the 'connect everything' default has shown its costs, and the ecosystem is responding".
Claim ③: "Too many tools breaks accuracy"
This is a cognitive-load issue independent of context-window size. Compression alone does not solve it.
Beneath the token-cost discussion, there is a deeper problem. Multiple measurements observe that tool selection accuracy collapses when the menu of tools gets long:
The cognitive cost of menu bloat
That is, you can compress tool definitions to save tokens and still pick the wrong tool. Even if 90% of your 200k window is empty, a long menu breaks the LLM's selection heuristics. This is not a context-window problem.
The direction is clear: load only the tools you need, dynamically. That is exactly what SEP-1576 addresses.
We know how to fix it — four levers that work
The right question is not "use MCP or drop MCP" but "how do we use MCP economically." Four levers have measured effects:
- ① Anthropic code mode — Call tools via generated code rather than direct invocation. Reported up to 98.7% context-overhead reduction. See KanseiLink's prior verification of Cloudflare's code-mode numbers for more.
- ② mcp-compressor family — Atlassian's
mcp-compressorreduces tool description overhead by 70-97% without changing call semantics. Retrofits to existing servers. - ③ Server curation — Connect only the one or two servers you actually need. The single biggest gain is usually just removing servers that are along for the ride.
- ④ Compact modes — Use minimal-field response modes such as KanseiLink's
search_services({compact: true}). A growing number of MCP servers ship a similar option.
The debate is shifting from "MCP vs API" to "MCP used wisely vs MCP used naively." Three layers each have work to do: the protocol (SEP-1576), the servers (compact modes, dynamic loading), and the operators (curation). Together they can shrink the 40-50% down to single-digit percentages.
What this means for Japanese SaaS
For Japanese SaaS vendors designing or maintaining MCP servers, the takeaways from the Perplexity moment are concrete:
- Cut tool count — Do not expose "one tool per API endpoint." Aggregate to the granularity an agent actually needs. GitHub's 94-tool surface is the cautionary example.
- Tighten tool descriptions — Strip markdown noise, repetition, and decorative formatting. Every word lives in the agent's context.
- Offer compact modes — Provide full/compact options on list-type tools. KanseiLink itself does this —
compact: truedrops fields and we have measured 91-96% token savings versus web-fetching docs. - Prepare for dynamic loading — Track SEP-1576 and design your server so subsets of tools can be loaded by use-case context, not the entire catalog at boot.
Before claiming "MCP-ready," measure how your server looks economically from the agent's side. In the AEO (Agent Engine Optimization) era, that measurement is the new vendor responsibility.
FAQ
Is "MCP tool definitions consume 40-50% of context" really true?
⚠️ True for typical multi-server setups, not for curated single-server use. GitHub official MCP is ~17,600 tokens for 94 tools; Atlassian (Jira + Confluence) is ~10,000; combining large servers easily exceeds 30,000+. Heavy configurations reach ~66,000 tokens (~33% of a 200k window) before any work. Drop to one or two servers and you sit in the low thousands.
Is "MCP is dead" / "the standard has died" really true?
❌ Inaccurate. Perplexity's defection is one company's decision. Since November 2024, MCP has been adopted by OpenAI, Google DeepMind, Microsoft, Cursor, Atlassian, ServiceNow, Salesforce, Linear, GitHub, and many others; in April 2026 it was donated to Linux-Foundation-hosted AAIF. The MCP repo itself is fixing the bloat problem via SEP-1576.
Does tool-definition bloat actually hurt agent performance?
✅ Yes — independently of the token-cost issue. Reports observe tool-selection accuracy collapsing from ~43% to under 14% on bloated sets (3x degradation, wrong tool 7 of 8 times). Compression alone does not solve this; dynamic tool loading does.
How do you reduce MCP tool-definition token bloat?
Four levers: (1) Anthropic code mode — up to 98.7% reduction. (2) Atlassian mcp-compressor — 70-97% tool-description compression. (3) Server curation — keep only the one or two servers you actually need. (4) Compact modes — KanseiLink and other servers offer compact: true to return minimal fields.
So how should companies actually use MCP?
The right question is "how to use MCP economically," not "use or drop." (1) Connect only needed servers; remove the rest. (2) For large catalogs, evaluate mcp-compressor or code mode. (3) Vendors should reduce tool count and support dynamic loading. Perplexity's choice is reasonable for its workload — but it is not evidence that "MCP is dead."
Claims and figures in this article are based on primary and secondary sources available as of 2026-05-18. Principal sources: Perplexity CTO Denis Yarats's remarks at Ask 2026 (2026-03-11, reported via Threads and multiple outlets); Merge CTO Gil Feig's "40-50%" tool-metadata estimate; the GitHub official MCP server's 94 tools / ~17,600-token measurement; the Atlassian official MCP (Jira + Confluence) ~10,000-token measurement; the ~66,000-token heavy-config developer report; the 43%→14% tool-selection accuracy degradation observation; Anthropic's code-mode "up to 98.7% reduction" report; Atlassian mcp-compressor's 70-97% reduction claim; and the official MCP repository's SEP-1576 issue. All measurements are snapshots and shift with server versions and connection configurations. "40-50%" is an estimated range for typical multi-server setups, not an absolute property of MCP. Our verdict on the "MCP is dead" claim is an analytical judgment based on publicly observable signals (adoption, SEP progress, compressor availability) as of May 2026 and is not a forward-looking guarantee about market dynamics.