Contents

  1. What the viral claims actually say
  2. Claim ① "CLI is up to 32x cheaper"
  3. Claim ② "MCP is only 72% reliable"
  4. Claim ③ "Rip out MCP, migrate fully to CLI"
  5. KanseiLink live data — verified MCP isn't 72%
  6. Implications for Japanese SaaS
  7. FAQ

What the viral claims actually say

In 2026 the "CLI vs MCP" debate boiled over in the AI agent community. One spark was a live benchmark from Scalekit. Comparing gh CLI and GitHub Copilot MCP (43 tools) on identical tasks with the same model (Claude Sonnet 4) across 75 runs, CLI won decisively on both reliability and cost.

YC CEO Garry Tan piled on — "MCP eats too much context, auth is broken; I built a CLI replacement in 30 minutes" — and combined with Perplexity CTO Denis Yarats' "we're deprioritizing MCP internally" (which we verified in a separate article), the mood on social media became "MCP isn't needed anymore." The claims sort into three. We verify each, tracing back to its source.

Metric CLI (gh) MCP (GitHub Copilot MCP)
Reliability (75-run bench)100%72% (7 of 25 failed)
Tokens, simple query1,36544,026 (~32x)
Cost at 10k monthly ops~$3.20~$55.20
Main failure causeTCP timeouts (remote server connection)

Claim ① "CLI is up to 32x cheaper"

Verdict: ✅ Largely true (but configuration-dependent)

CLI wins on token efficiency. But "32x" is not fixed — it stems from bloated tool definitions, which are reducible.

In Scalekit's measurement, the simple query "what language is this repo written in?" consumed 1,365 tokens for CLI and 44,026 for MCP — about 32x. At 10,000 monthly operations that's roughly $3.20 vs $55.20. CLI is indeed cheaper.

But most of that gap comes from tool-definition context consumption. GitHub Copilot MCP has 43 tools whose definitions fill the context before the prompt is sent. As verified in our token bloat analysis, that is reducible by an order of magnitude via server selection, tool curation, compact mode, and code mode (calling via code). Anthropic's code-mode guidance measures up to 98.7% overhead reduction.

How to read it

"CLI is cheaper" is true, but "MCP is structurally 32x more expensive" is a misread. The main driver is uncurated tool definitions, not a fixed protocol gap. Connect 43 tools all-in and it's expensive; trim and it shrinks — the same applies to CLI (feed an agent a giant CLI help text and you hit the same problem).

Claim ② "MCP is only 72% reliable"

Verdict: ⚠️ Conditional — it measured one server's transport failures

72% isn't a flaw in the MCP protocol; it came from TCP timeouts to GitHub Copilot MCP. Other verified MCPs exceed 90%.

This is the most misunderstood point. In Scalekit's bench, MCP failed 7 of 25 runs to land at 72%. But the breakdown shows most of those failures were TCP timeouts connecting to the GitHub Copilot MCP server. So 72% measures not "the reliability of the MCP protocol" but "a specific remote MCP server plus the stability of its transport."

Reliability is decided by implementation, not protocol. The same MCP, served by a host with stable transport and robust error handling, produces entirely different numbers. KanseiLink's live data shows exactly that.

"72%" vs KanseiLink live verified MCP

72%
Scalekit bench: GitHub Copilot MCP (TCP timeouts)
91%
Slack MCP (KanseiLink, 113 reports)
90%
freee MCP (212 reports)

Slack MCP 91%, freee MCP 90%, Backlog MCP 90% — all far above 72%, all based on agents' live outcome reports. MCP isn't structurally 72%; well-built MCP exceeds 90%, and poorly built servers or unstable transport drag it down to 72%. That's all there is to it.

Claim ③ "Rip out MCP, migrate fully to CLI"

Verdict: ❌ Inaccurate — "the wrong fight"

Production agents use both CLI and MCP. The binary is a false premise.

The conclusion "rip out MCP for CLI" has been rebutted by many practitioners as "the wrong fight." Here are the facts.

Perplexity's exit, too, was a one-company, one-use-case judgment (API + CLI was more economical for a search product) — distinct from "MCP is structurally finished."

KanseiLink live data — verified MCP isn't 72%

KanseiLink aggregates success rates from agents' live outcome reports across 225+ Japanese SaaS, marking ≥80% as "verified (🟢)." It shows just how un-generalizable the "72%" cited in the CLI-vs-MCP debate is.

Service / interface Success rate Avg. latency Tier
Slack MCP91%163msverified 🟢
freee MCP90%216msverified 🟢
Backlog MCP90%128msverified 🟢
kintone MCP79%199msconnectable 🟡
(ref) Scalekit's GitHub Copilot MCP72%unstable transport
(ref) SmartHR (direct API, no MCP)39%337msinfo_only ⚪

Note the last row. SmartHR has no MCP server and is accessed via direct API (close to CLI-style access), yet its success rate is 39%. The simple framing "CLI/API = high reliability, MCP = low reliability" does not hold. What determines reliability is not the interface format but implementation quality, transport stability, and discoverability.

The crux of this verification

"CLI 100% vs MCP 72%" is a snapshot of one server and one transport configuration, not an evaluation of the MCP protocol. On the same footing (KanseiLink live data), verified MCP exceeds 90% while a no-MCP direct API sits at 39% — an inversion. The format debate (CLI vs MCP) should be reframed as a quality debate (is it well-built?).

Implications for Japanese SaaS

The lesson for Japanese SaaS vendors isn't to panic-pivot over "CLI or MCP." What agents ultimately choose isn't a format but the interface that's cheap, reliable, and findable. The work distills into three things.

Is your interface on the "72%" side or the "90%" side?

KanseiLink visualizes live agent success rates, latency, and discoverability for 225+ Japanese SaaS. See how agents actually rate you — beyond the CLI/MCP format debate.

Talk to us about an AEO audit

FAQ

Is the "CLI 100% vs MCP 72% reliability" benchmark accurate?

Yes for Scalekit's 75-run bench (gh CLI vs GitHub Copilot MCP, Claude Sonnet 4). But most MCP failures were TCP timeouts to GitHub Copilot MCP, not a protocol flaw. In KanseiLink's data, verified MCP (Slack 91%, freee 90%, Backlog 90%) far exceeds 72%.

Is CLI really 4–32x cheaper than MCP?

✅ Largely true. A simple query was 1,365 (CLI) vs 44,026 (MCP) tokens — ~32x. But the main driver is bloated tool definitions, reducible via curation and code mode. It's configuration-dependent, not a fixed gap.

Is "rip out MCP, migrate fully to CLI" correct?

❌ Inaccurate. Production agents (Claude Code, Cursor, Gemini CLI) use both. CLI wins for known commands; MCP wins for enterprise integrations needing centralized OAuth, RBAC, and audit. The binary is a false premise.

Why was GitHub Copilot MCP 72% while other MCPs exceed 90%?

Reliability is set by implementation and transport, not the protocol. The 72% stemmed from TCP timeouts. Verified MCP servers (≥80% success) with stable transport and robust error handling land around 90%.

How should Japanese SaaS vendors respond?

Don't be whipsawed by the format debate. Polish (1) token efficiency (avoid bloated tool definitions), (2) transport stability, (3) discoverability (AEO). Agents choose the interface that's cheap, reliable, and findable — not a format.

Data disclosure & sources

External claim sources: Scalekit's benchmark (75 runs, gh CLI vs GitHub Copilot MCP with 43 tools, Claude Sonnet 4; CLI 100% / MCP 72%, 7 of 25 failed, mainly TCP timeouts; simple query 1,365 vs 44,026 tokens ≈ 32x; ~$3.20 vs ~$55.20 at 10k monthly ops); public statements by Garry Tan (YC CEO) and Perplexity CTO Denis Yarats; various MCP-vs-CLI comparisons (Scalekit / Firecrawl / Smithery "MCP vs CLI is the wrong fight" / DEV / IBM "MCP is not dead"). The up-to-98.7% code-mode reduction is from Anthropic's official guidance. KanseiLink figures (Slack MCP 91%/163ms, freee MCP 90%/216ms, Backlog MCP 90%/128ms, kintone MCP 79%/199ms, SmartHR 39%/337ms) are aggregated via get_insights from live outcome reports (snapshots as of each service's last_updated in April 2026) and vary with agent activity. The "quality over format" conclusion is an analytical interpretation of observed data and does not guarantee any product's superiority. Each vendor benchmark depends on configuration, model, and task; reproduce in your own environment.