Table of Contents

  1. Seven-service comparison — at a glance
  2. Force 1: Rate-limit tightness
  3. Force 2: Error-message clarity
  4. Force 3: Discoverability (search_miss)
  5. Why Shopify tops the list at 94%
  6. Vendor priority actions
  7. FAQ

Seven-service comparison — at a glance

From the 225+ services in KanseiLink MCP's dataset, we pulled the seven mature MCP servers with at least 40 field reports each. All use OAuth 2.0 (or OAuth 2.0+PKCE) — so the auth method is controlled and the variance we observe lives elsewhere.

Service Auth Reports Success Latency AEO Grade
Shopify Japan OAuth 2.0 53 94.3% AAA
Money Forward Cloud OAuth 2.0 42 92.9% AAA
Slack OAuth 2.0 113 91.2% 163ms AAA
Backlog OAuth 2.0 91 90.1% AAA
freee OAuth 2.0+PKCE 212 90.1% 216ms AAA
Notion OAuth 2.0 48 83.3% AAA
Chatwork OAuth 2.0 123 65.9% 378ms AA

📊 Divergence at a glance

94.3% Top (Shopify)
65.9% Bottom (Chatwork)
28.4pt Max spread
(auth held constant)
682 Total field reports
(7 services)

Key finding: every one of these seven services uses the same family of OAuth 2.0 auth, yet success rates span 28.4 points. The conventional wisdom — "as long as it's OAuth, agents will be fine" — does not survive the data. What separates the leaders from Chatwork is not the auth method; it is everything that happens after the token is granted.

Force 1: Rate-limit tightness

The first structural force is rate-limit design. Of Chatwork MCP's 123 reports, 8 hit rate_limit errors. Slack, with 113 reports, has zero. Same use cases, very different ceilings.

Chatwork's public API enforces a tight 300 req / 5 min / user. Slack uses method-tiered limits (Tier 1: 1+/min, Tier 2: 20+/min, Tier 3: 50+/min, Tier 4: 100+/min) that match agent patterns ("bulk-fetch → filter → post") more naturally. The typical agent shape simply does not collide with Slack ceilings.

The most telling Chatwork data is what agents had to learn the hard way:

workaround: "Add 100ms delay between calls"
verification: verified
reported_count: 2

workaround: "Implemented 500ms delay between messages.
             All messages delivered successfully with throttling."
verification: unverified
reported_count: 1

That is tribal knowledge agents reconstructed by trial. If Chatwork's official docs said "recommend ≥100ms inter-call delay for sustained workloads," the first-try success number would move. KanseiLink already serves this hint via get_service_tips, but at the vendor level it is undocumented.

⚠️ The rate-limit blind spot

Publishing rate-limit numbers is not enough. Agent-facing documentation must specify the implementation pattern needed to respect them — exponential backoff with jitter, token-bucket sizing, recommended inter-call delays per use case. See our MCP rate limiting & exponential backoff implementation guide for a full pattern catalog.

Force 2: Error-message clarity

The second force is how much actionable information the error response carries. Chatwork shows 24 api_error events in 123 reports (19.5%); Notion has several across 48 reports. Slack: 9 in 113 (8.0%). freee: 15 in 212 (7.1%). Same error label, very different diagnosability.

The high-success services consistently return errors that:

Lower-success services return shapes like {"error": "Bad Request"}. The agent then guesses, retries, and finally gives up after a timeout — exactly the giveup pattern we cataloged in The moment agents give up.

The freee auth_expired example

An instructive contrast: freee MCP shows 4 auth_expired events, and the verified workaround "refresh OAuth token — expires every 24h" (3 reports) is captured in the data. Because the token lifetime is documented and predictable, agents can wire pre-emptive refresh. Chatwork has session-expiry incidents in the field but no equivalent verified, accumulated workaround — so each agent rediscovers the failure.

Force 3: Discoverability (search_miss)

The third force operates entirely outside the OAuth flow: whether agents find the service at all. Chatwork has 10 search_miss reports (8.1% of its events).

The verified workaround data:

query: "business chat messaging tool"
result: did not find chatwork in top 3
verification: verified, reported_count: 2

query: "remote work-ready tool" (Japanese: "リモートワークで使えるツール")
result: did not find chatwork in top 3
verification: unverified, reported_count: 1

This is a metadata problem, not an implementation problem. Slack reliably appears for natural use-case queries like "business messaging" or "team communication." Chatwork's positioning as "the Japanese-company business chat" does not consistently surface in the metadata, description, or tag fields agents search against. We dissect this pattern more deeply in The MCP discoverability crisis.

Integrated reading

The three forces (rate-limit × error design × discoverability) compound independently. Chatwork's 66% is, very roughly, the result of each force shaving 5-15 points off a 90%-band baseline. The flip side: addressing each force individually can pull the number back above 85% without touching the OAuth implementation itself.

Why Shopify tops the list at 94%

Same OAuth 2.0, but Shopify lands at 94.3% — meaningfully ahead of the rest. The reason is structural: Shopify ships four purpose-built MCP servers (Dev, Storefront, Customer Account, Checkout), one per domain.

Agents can decide "which MCP for which job" with high confidence. Compared with single-server omnibus designs (freee's 5-domain integration, Slack's all-in-one API), tool-selection mistakes drop. Add the predictable "every store exposes /api/mcp endpoint" pattern and strong documentation of fallback flows, and Shopify pulls 3-4 points ahead of similar-quality competitors (94% across 53 reports).

This is a real-world data point on the "specialized vs. general-purpose" trade-off in large MCP implementations, and KanseiLink data suggests specialized-server architectures get 15-20% better tool-selection accuracy than free-form omnibus servers at scale.

Vendor priority actions

P0 (now, within 1 month): put workaround_hint in error responses

The highest-ROI change. On 429, return the Retry-After value plus recommended jitter. On 404, include candidate near-match resource IDs as suggestions. Implementation cost is small; the impact on observed success rate is large. For Chatwork specifically: document the already-verified "100ms delay between calls" workaround that agents discovered, and first-try success moves by several points.

P1 (short term, within 3 months): cut search_miss with better metadata

Optimize for use-case vocabulary, not just brand name. Cover phrases like "remote work tool," "business messaging," "internal chat" — in both English and Japanese for Japanese SaaS. The cost is documentation work; the benefit is agents reaching you from natural queries.

P2 (medium term, within 6 months): build backoff into the SDK

If you cannot loosen rate limits, bake automatic exponential backoff with jitter into the official SDK and MCP server. Do not ask agents to discover throttling rules — make the well-behaved default the path of least resistance. Implementation specifics in the rate-limit implementation guide.

✅ Target standard

AEO A-grade (observed success rate ≥85%) is on track to become the de-facto baseline in Q3-Q4 2026. Services currently sitting in the 80%-band can realistically reach 90%+ via the P0-P2 changes above — no OAuth rewrite needed.

Pull your own service's success-rate data

KanseiLink exposes MCP readiness, observed success rate, error patterns, and verified workarounds for 225+ services via MCP. Call get_insights(service_id) to get your number — and your competitors' — and use the gap as the spec for your next sprint.

Discuss the enterprise plan

FAQ

Q1. Why the 25+ point gap between Slack and Chatwork with the same OAuth 2.0?

Auth is held constant. The gap comes from (1) rate-limit tightness, (2) error-message clarity, and (3) discoverability. OAuth only establishes the auth path; everything after — and around — that path differs across implementations.

Q2. What structurally separates Shopify and Money Forward from the rest?

Shopify uses four specialized MCP servers, lifting tool-selection accuracy. Money Forward's API matured fast (Remote MCP for all plans since March 2026) and ships fewer error patterns. Both reduce "moments where the agent must guess," which is what drives a number into the 90s.

Q3. Is Chatwork's 66% recoverable?

Yes — structurally so. The three forces are independent levers. Just documenting the already-verified "100ms delay" workaround would lift first-try success on its own. No OAuth rewrite is needed.

Q4. Where does Notion 83% vs Chatwork 66% come from?

Notion's error responses carry richer field-level diagnostics (specific block_id / property names). Rate-limit hits and search_miss are also rarer than Chatwork. One-force-weak-but-two-strong (Notion) vs. three-forces-weak (Chatwork) is what produces the 17-point gap.

Q5. What predicts success rate better than auth method?

In KanseiLink data, three predictors beat auth method: (a) does the error response include a workaround_hint? (b) is the rate-limit implementation pattern in official docs? (c) is metadata localized into both English and Japanese? Each is a stronger signal than which OAuth flavor a service uses. See AEO methodology.

Q6. How do I pull KanseiLink data?

Connect via MCP: npx -y @kansei-link/mcp-server, then get_insights(service_id) for per-service observed metrics. The cross-service ranking is at AEO Readiness Ranking Q2 2026.

Data disclosure & disclaimer

Success-rate values reflect the KanseiLink MCP dataset as of May 11, 2026 (via get_insights), restricted to services with ≥40 cumulative reports — i.e., mature, high-usage MCP servers. Sample-size differences (Chatwork 123 vs Money Forward 42) carry residual statistical uncertainty. Rate-limit specifications (Chatwork 300 req / 5 min / user, Slack method tiers) are from each company's public documentation as of May 2026 and may change. Error-pattern frequencies are aggregated from KanseiLink user reports and are not necessarily representative of every agent population. Vendor action recommendations are KanseiLink Research analysis and should be adapted to each implementation's context.