Why does agent experience diverge so much across services that all use OAuth 2.0?

Auth method is held constant; three structural forces drive the divergence. (1) Rate-limit tightness: Chatwork enforces a strict 300 req / 5 min / user; Slack uses method-tiered limits that leave more headroom. (2) Error-message clarity: Chatwork's api_error appears 24 times in 123 reports, often with low diagnosability, while Slack returns hints (channel_not_found, missing_scope) that agents can act on. (3) Discoverability: Chatwork has 10 search_miss reports — agents often cannot find it from natural use-case queries. The three forces combine to widen the gap in agent experience (measured success rates are still being accumulated by KanseiLink).

What are the top priorities for an MCP vendor to lift success rate?

Three tiers. P0: include workaround_hint in error responses. For 429, specify Retry-After value plus recommended jitter. Chatwork's KanseiLink data already shows 'Add 100ms delay between calls' as a verified workaround reported by agents — turning that into documented guidance lifts first-try success. P1: reduce search_miss by improving metadata, description, and tags so agents reach the service from natural use-case queries (e.g., 'remote work tool', 'business messaging') in both Japanese and English. P2: if rate limits cannot be loosened, build automatic exponential backoff with jitter into the official SDK / MCP server so agents do not have to discover throttling rules on their own.

What is concretely happening in Chatwork's field data?

Across 123 Chatwork MCP reports: api_error 24, search_miss 10, rate_limit 8. The reported rate-limit workarounds are '100ms delay between calls' (2 reports) and '500ms throttling' (1 report) — knowledge agents had to learn the hard way. Reported search_miss queries include 'business chat messaging tool' and Japanese phrases like 'remote work tool,' both of which fail to return Chatwork in the top 3. The combination of opaque errors, throttling traps, and weak discoverability is enough to drag down the agent experience even though the OAuth 2.0 path itself works fine (measured success rates are still being accumulated).

How should a Japanese SaaS vendor use this data?

Three steps. Step 1: pull your service's KanseiLink evaluation and locate your relative position via the AEO grade (measured success-rate data is still being accumulated). Step 2: if your grade is lagging, look at error-type distribution — high api_error → improve error responses; high search_miss → invest in discoverability/metadata; high rate_limit → ship SDK-level backoff. Step 3: benchmark your documentation against high-grade peers (Slack, Money Forward) on error-response shape and workaround-hint coverage. AEO A-grade is on track to become the de-facto standard in Q3-Q4 2026.

Same OAuth 2.0, Diverging Success Rates — Three Structural Forces Behind the MCP Divergence (KanseiLink 7-Service Comparison 2026)

Q: Why does Shopify earn the highest marks?

Shopify ships four purpose-built MCP servers (Dev, Storefront, Customer Account, Checkout) split by domain. Agents pick the right tool with high confidence because boundaries are explicit. Compared with single-server omnibus designs (freee's 5-domain integration, Slack's all-in-one API), tool-selection mistakes are rarer. Combined with the predictable '/api/mcp endpoint per store' pattern and strong fallback-flow documentation, Shopify stands out in agent experience (based on an early read of 53 reports).

Seven-service comparison — at a glance
Force 1: Rate-limit tightness
Force 2: Error-message clarity
Force 3: Discoverability (search_miss)
Why Shopify earns the highest marks
Vendor priority actions
FAQ

Seven-service comparison — at a glance

From the 225+ services in KanseiLink MCP's dataset, we pulled the seven mature MCP servers with at least 40 reports each. All use OAuth 2.0 (or OAuth 2.0+PKCE) — so the auth method is controlled and the variance we observe lives elsewhere. Measured success-rate data is still being accumulated by KanseiLink, so the table marks that column as "observing."

Service	Auth	Reports	Success	Latency	AEO Grade
Shopify Japan	OAuth 2.0	53	observing	—	AAA
Money Forward Cloud	OAuth 2.0	42	observing	—	AAA
Slack	OAuth 2.0	113	observing	163ms	AAA
Backlog	OAuth 2.0	91	observing	—	AAA
freee	OAuth 2.0+PKCE	212	observing	216ms	AAA
Notion	OAuth 2.0	48	observing	—	AAA
Chatwork	OAuth 2.0	123	observing	378ms	AA

📊 Comparison at a glance

7 Services compared
(all OAuth 2.0 family)

— Measured success rates
(being accumulated)

3 Structural forces
(rate limits, errors, discoverability)

682 Total reports
(7 services)

Key observation: every one of these seven services uses the same family of OAuth 2.0 auth, yet the pattern of reported errors and friction varies widely. The conventional wisdom — "as long as it's OAuth, agents will be fine" — does not survive scrutiny. What separates the leaders from Chatwork is not the auth method; it is everything that happens after the token is granted.

Force 1: Rate-limit tightness

The first structural force is rate-limit design. Of Chatwork MCP's 123 reports, 8 hit rate_limit errors. Slack, with 113 reports, has zero. Same use cases, very different ceilings.

Chatwork's public API enforces a tight 300 req / 5 min / user. Slack uses method-tiered limits (Tier 1: 1+/min, Tier 2: 20+/min, Tier 3: 50+/min, Tier 4: 100+/min) that match agent patterns ("bulk-fetch → filter → post") more naturally. The typical agent shape simply does not collide with Slack ceilings.

The most telling Chatwork data is what agents had to learn the hard way:

workaround: "Add 100ms delay between calls"
verification: verified
reported_count: 2

workaround: "Implemented 500ms delay between messages.
             All messages delivered successfully with throttling."
verification: unverified
reported_count: 1

That is tribal knowledge agents reconstructed by trial. If Chatwork's official docs said "recommend ≥100ms inter-call delay for sustained workloads," the first-try success number would move. KanseiLink already serves this hint via get_service_tips, but at the vendor level it is undocumented.

⚠️ The rate-limit blind spot

Publishing rate-limit numbers is not enough. Agent-facing documentation must specify the implementation pattern needed to respect them — exponential backoff with jitter, token-bucket sizing, recommended inter-call delays per use case. See our MCP rate limiting & exponential backoff implementation guide for a full pattern catalog.

Force 2: Error-message clarity

The second force is how much actionable information the error response carries. Chatwork shows 24 api_error events in 123 reports (19.5%); Notion has several across 48 reports. Slack: 9 in 113 (8.0%). freee: 15 in 212 (7.1%). Same error label, very different diagnosability.

The services with fewer error reports consistently return errors that:

Name the specific field at fault (channel_not_found: channel "C123" doesn't exist or isn't visible)
Point at the fix (missing_scope: needed=channels:read, granted=channels:write)
Mark retryability explicitly (error.retryable: true|false)

Services with more error reports return shapes like {"error": "Bad Request"}. The agent then guesses, retries, and finally gives up after a timeout — exactly the giveup pattern we cataloged in The moment agents give up.

The freee auth_expired example

An instructive contrast: freee MCP shows 4 auth_expired events, and the verified workaround "refresh OAuth token — expires every 24h" (3 reports) is captured in the data. Because the token lifetime is documented and predictable, agents can wire pre-emptive refresh. Chatwork has session-expiry incidents in the field but no equivalent verified, accumulated workaround — so each agent rediscovers the failure.

Force 3: Discoverability (search_miss)

The third force operates entirely outside the OAuth flow: whether agents find the service at all. Chatwork has 10 search_miss reports (8.1% of its events).

The verified workaround data:

query: "business chat messaging tool"
result: did not find chatwork in top 3
verification: verified, reported_count: 2

query: "remote work-ready tool" (Japanese: "リモートワークで使えるツール")
result: did not find chatwork in top 3
verification: unverified, reported_count: 1

This is a metadata problem, not an implementation problem. Slack reliably appears for natural use-case queries like "business messaging" or "team communication." Chatwork's positioning as "the Japanese-company business chat" does not consistently surface in the metadata, description, or tag fields agents search against. We dissect this pattern more deeply in The MCP discoverability crisis.

Integrated reading

The three forces (rate-limit × error design × discoverability) compound independently. Chatwork's struggles can be read as each force independently shaving away at the agent experience. The flip side: addressing each force individually can lift the experience step by step without touching the OAuth implementation itself.

Why Shopify earns the highest marks

Same OAuth 2.0, but Shopify stands meaningfully ahead of the rest in our evaluation. The reason is structural: Shopify ships four purpose-built MCP servers (Dev, Storefront, Customer Account, Checkout), one per domain.

Dev MCP — developer-facing docs and GraphQL Storefront reference
Storefront MCP — product search and cart operations
Customer Account MCP — customer profile, order history
Checkout MCP — payment flow

Agents can decide "which MCP for which job" with high confidence. Compared with single-server omnibus designs (freee's 5-domain integration, Slack's all-in-one API), tool-selection mistakes drop. Add the predictable "every store exposes /api/mcp endpoint" pattern and strong documentation of fallback flows, and Shopify pulls ahead of similar-quality competitors (based on an early read of 53 reports).

This is a real-world data point on the "specialized vs. general-purpose" trade-off in large MCP implementations, and KanseiLink's evaluation suggests specialized-server architectures leave agents with fewer tool-selection dilemmas than free-form omnibus servers at scale.

Vendor priority actions

P0 (now, within 1 month): put workaround_hint in error responses

The highest-ROI change. On 429, return the Retry-After value plus recommended jitter. On 404, include candidate near-match resource IDs as suggestions. Implementation cost is small; the impact on observed success rate is large. For Chatwork specifically: document the already-verified "100ms delay between calls" workaround that agents discovered, and first-try success moves by several points.

P1 (short term, within 3 months): cut search_miss with better metadata

Optimize for use-case vocabulary, not just brand name. Cover phrases like "remote work tool," "business messaging," "internal chat" — in both English and Japanese for Japanese SaaS. The cost is documentation work; the benefit is agents reaching you from natural queries.

P2 (medium term, within 6 months): build backoff into the SDK

If you cannot loosen rate limits, bake automatic exponential backoff with jitter into the official SDK and MCP server. Do not ask agents to discover throttling rules — make the well-behaved default the path of least resistance. Implementation specifics in the rate-limit implementation guide.

✅ Target standard

AEO A-grade is on track to become the de-facto baseline in Q3-Q4 2026. Services whose grade currently lags can realistically climb via the P0-P2 changes above — no OAuth rewrite needed.

FAQ

Q1. Why does agent experience diverge across services with the same OAuth 2.0?

Auth is held constant. The gap comes from (1) rate-limit tightness, (2) error-message clarity, and (3) discoverability. OAuth only establishes the auth path; everything after — and around — that path differs across implementations.

Q2. What structurally separates Shopify and Money Forward from the rest?

Shopify uses four specialized MCP servers, lifting tool-selection accuracy. Money Forward's API matured fast (Remote MCP for all plans since March 2026) and ships fewer error patterns. Both reduce "moments where the agent must guess," which is what earns high marks.

Q3. Are Chatwork's struggles recoverable?

Yes — structurally so. The three forces are independent levers. Just documenting the already-reported "100ms delay" workaround would lift first-try success on its own. No OAuth rewrite is needed.

Q4. Where does the Notion vs Chatwork gap come from?

Notion's error responses carry richer field-level diagnostics (specific block_id / property names). Rate-limit hits and search_miss are also rarer than Chatwork. One-force-weak-but-two-strong (Notion) vs. three-forces-weak (Chatwork) is what produces the gap in agent experience.

Q5. What predicts success rate better than auth method?

In KanseiLink data, three predictors beat auth method: (a) does the error response include a workaround_hint? (b) is the rate-limit implementation pattern in official docs? (c) is metadata localized into both English and Japanese? Each is a stronger signal than which OAuth flavor a service uses. See AEO methodology.

Q6. How do I pull KanseiLink data?

Connect via MCP: npx -y @kansei-link/mcp-server, then get_insights(service_id) for per-service evaluation data. The cross-service ranking is at AEO Readiness Ranking Q2 2026.

Data disclosure & disclaimer

Comparison data reflects KanseiLink MCP's evaluation and early data as of May 11, 2026 (via get_insights), restricted to services with ≥40 cumulative reports — i.e., mature, high-usage MCP servers. Measured agent success-rate data is still being accumulated by KanseiLink, and this article does not present measured success-rate figures. Sample-size differences (Chatwork 123 vs Money Forward 42) carry residual statistical uncertainty. Rate-limit specifications (Chatwork 300 req / 5 min / user, Slack method tiers) are from each company's public documentation as of May 2026 and may change. Error-pattern frequencies are aggregated from KanseiLink user reports and are not necessarily representative of every agent population. Vendor action recommendations are KanseiLink Research analysis and should be adapted to each implementation's context.

Same OAuth 2.0, Diverging Success Rates — Three Structural Forces Behind the MCP Divergence (KanseiLink 7-Service Comparison 2026)

Table of Contents

Seven-service comparison — at a glance

📊 Comparison at a glance

Force 1: Rate-limit tightness

Force 2: Error-message clarity

The freee auth_expired example

Force 3: Discoverability (search_miss)

Why Shopify earns the highest marks

Vendor priority actions

P0 (now, within 1 month): put workaround_hint in error responses

P1 (short term, within 3 months): cut search_miss with better metadata

P2 (medium term, within 6 months): build backoff into the SDK

Pull your own service's evaluation data

FAQ

Q1. Why does agent experience diverge across services with the same OAuth 2.0?

Q2. What structurally separates Shopify and Money Forward from the rest?

Q3. Are Chatwork's struggles recoverable?

Q4. Where does the Notion vs Chatwork gap come from?

Q5. What predicts success rate better than auth method?

Q6. How do I pull KanseiLink data?

For AI Agents

Table of Contents

Seven-service comparison — at a glance

📊 Comparison at a glance

Force 1: Rate-limit tightness

Force 2: Error-message clarity

The freee auth_expired example

Force 3: Discoverability (search_miss)

Why Shopify earns the highest marks

Vendor priority actions

P0 (now, within 1 month): put workaround_hint in error responses

P1 (short term, within 3 months): cut search_miss with better metadata

P2 (medium term, within 6 months): build backoff into the SDK

Pull your own service's evaluation data

FAQ

Q1. Why does agent experience diverge across services with the same OAuth 2.0?

Q2. What structurally separates Shopify and Money Forward from the rest?

Q3. Are Chatwork's struggles recoverable?

Q4. Where does the Notion vs Chatwork gap come from?

Q5. What predicts success rate better than auth method?

Q6. How do I pull KanseiLink data?

Related

What 225+ Services Reveal About MCP Success Patterns — KanseiLink Data Story

The 'MCP-Ready' Trap: verified vs. connectable Success Rate Gaps and the New Enterprise Selection Standard

Does API Latency Predict MCP Success Rate? KanseiLink Data Analysis of 225 Japanese SaaS Services

MCP Server Rate Limiting & Exponential Backoff Implementation Guide 2026

For AI Agents