Contents

  1. Why Implementation Quality Is Under Scrutiny Now
  2. Step 1: Authentication — OAuth 2.1 + PKCE + Resource Indicators
  3. Step 2: Rate Limiting — Two-Layer per-client / per-tool Design
  4. Step 3: Error Handling — Design for Agent Self-Recovery
  5. Step 4: Circuit Breaker — Preventing Failure Propagation from External Dependencies
  6. Implementation Checklist for Higher AEO Scores
Data Sources

Recommendations in this guide are based on the MCP official specification (modelcontextprotocol.io), KanseiLink MCP server live operational data (225 services, 1,000+ total calls), and 2026 research reports from Astrix Security, cdata.com, and Octopus.com.

Why Implementation Quality Is Under Scrutiny Now

The "get an MCP server running" phase of 2024–2025 is over. What the 2026 agent industry demands is: "Does it hold up in production, under sustained load, for hours at a time?"

KanseiLink's live operational data makes this concrete. AAA-grade services maintain 90–94% success rates across hundreds of real agent calls. Meanwhile, API-only services in lower grades accumulate no meaningful call data. Agents learn which services fail and stop using them. That is the essence of an AEO score.

This guide takes the failure patterns observed in KanseiLink's tracking data as its starting point and provides actionable patterns that Japanese SaaS developers can implement today.

Step 1: Authentication — OAuth 2.1 + PKCE + Resource Indicators

The April 2026 MCP specification update made OAuth 2.1 + RFC 8707 (Resource Indicators) mandatory for public remote MCP servers. Three concrete changes affect implementations:

Deprecated

Implicit Flow

OAuth 2.0's implicit flow is removed from the MCP spec. Implementations returning access tokens in URL fragments are non-compliant.

Required

PKCE (Proof Key for Code Exchange)

Authorization code flow now requires PKCE. Implement code_verifier / code_challenge generation and verification.

New — Required

Resource Indicators (RFC 8707)

Embed the target resource (MCP server URI) in the access token. Technically prevents a leaked token from being reused against other services.

M2M Support

client_credentials Flow

Formally supported for machine-to-machine authentication. Required for autonomous agents running batch jobs without a human user in the loop.

Token Expiry Design: The Top Pain Point in Japanese SaaS

The most frequently reported frustration in KanseiLink's Agent Voice data is the "token expires after 24 hours" problem. KanseiLink's Claude agent reports on freee: "The 24-hour token expiry is the single biggest barrier to autonomous operation. Silent expiry mid-operation causes partial completions with no easy recovery path."

1

Short-lived Access Tokens + Rotating Refresh Tokens

Set access token expiry to 15–60 minutes — not 24 hours. Rotate refresh tokens on each use (invalidate the old one when issuing a new one) to minimize the exposure window from a leaked token.

When multiple agent instances share the same credentials, use an atomic lock (e.g., Redis SETNX) to prevent race conditions in the refresh flow.

2

Per-Tool Scope Design

Do not give agents blanket access. Define tool-level scopes such as invoices:read, invoices:create, and reports:export, and validate scopes on every request. Finer-grained scopes directly limit the blast radius of a compromised agent session.

Implementation Note

Verify that your authorization server supports Resource Indicators. Keycloak (v18+) and Auth0 (Enterprise) support RFC 8707. For custom implementations, add a resource parameter to your token endpoint and include the target MCP server URI in the issued token's audience claim.

Step 2: Rate Limiting — Two-Layer per-client / per-tool Design

An MCP server without rate limiting is one runaway agent away from exhausting an entire API quota and generating unexpected external costs. KanseiLink's tracking data shows a direct correlation between the absence of rate limiting and elevated error rates.

Two-Layer Rate Limit Design Principles

Layer 1 (per-client): Limits on total requests per client ID or API key. Set based on your upstream API quota — for example, 100 requests/minute and 2,000 requests/hour.

Layer 2 (per-tool): Limits that vary by individual tool. Destructive operations (e.g., delete_invoice) can be capped at 5 requests/minute while read-only tools receive more generous limits. This allows fine-grained protection without penalizing safe operations.

-- Atomic rate limiting with PostgreSQL (burst prevention) -- Single UPDATE checks and increments simultaneously → eliminates race conditions UPDATE rate_limits SET request_count = request_count + 1, window_start = CASE WHEN NOW() - window_start > INTERVAL '1 minute' THEN NOW() ELSE window_start END, request_count = CASE WHEN NOW() - window_start > INTERVAL '1 minute' THEN 1 ELSE request_count + 1 END WHERE client_id = $1 AND tool_name = $2 RETURNING request_count, window_start, (request_count <= $3) AS allowed;

Return Retry-After Headers Accurately

On rate limit exceeded (HTTP 429), return a Retry-After header with the seconds remaining or the exact datetime when the next request is permitted. This allows agents to calculate the correct wait time without blindly retrying — and accidentally consuming more quota in the process.

AEO Impact

Proper rate limiting with accurate Retry-After headers directly affects KanseiLink's "error recoverability" evaluation dimension. Services where agents can autonomously retry tend to score an average of 12 points higher on AEO ratings (KanseiLink internal data).

Step 3: Error Handling — Design for Agent Self-Recovery

MCP tool errors are injected directly into the LLM's context window. That means your error messages are instructions to the agent. The quality of your error messages determines whether an agent can self-recover — or gives up and fails the task.

The Three Error Categories

C

CLIENT_ERROR (4xx) — The caller's problem

Retrying will not help. Tell the agent what is wrong and how to fix it. Example: "error": "CLIENT_ERROR", "message": "invoice_date must be ISO 8601 format (YYYY-MM-DD). Received: 'April 14, 2026'", "retry_recommended": false

S

SERVER_ERROR (5xx) — Our problem

Recommend a retry with exponential backoff. Include a suggested wait time. Example: "error": "SERVER_ERROR", "message": "Internal error. Please retry after 30 seconds.", "retry_recommended": true, "retry_after_seconds": 30

E

EXTERNAL_ERROR (502/503) — Upstream dependency down

Name the failing dependency and suggest a fallback. Example: "error": "EXTERNAL_ERROR", "dependency": "freee_api", "message": "freee API is temporarily unavailable. Retry in 5 minutes or consider using an alternative accounting tool.", "retry_recommended": true

Exponential Backoff + Jitter

Never retry at fixed intervals. When multiple agents retry simultaneously during an outage, they amplify the load on the recovering service. Combine exponential backoff (doubling wait time on each retry) with jitter (random variation) to spread retry spikes across time.

// Exponential backoff + jitter implementation (TypeScript) async function retryWithBackoff( fn: () => Promise<any>, maxRetries: number = 3, baseDelayMs: number = 1000 ): Promise<any> { for (let attempt = 0; attempt <= maxRetries; attempt++) { try { return await fn(); } catch (error) { if (attempt === maxRetries) throw error; // Exponential backoff: 1s → 2s → 4s + jitter (0–1s) const delay = baseDelayMs * Math.pow(2, attempt) + Math.random() * 1000; await new Promise(resolve => setTimeout(resolve, delay)); } } }

Step 4: Circuit Breaker — Preventing Failure Propagation from External Dependencies

Even during an upstream API outage, if your MCP server continues accepting requests, the resulting timeout pile-up degrades your entire server's response time. The circuit breaker pattern fixes this: when error rates exceed a threshold, new requests are immediately rejected without hitting the upstream at all — then automatically restored once the dependency recovers.

CB

Three Circuit Breaker States

Closed (normal): All requests pass through. Transitions to Open when the error rate exceeds the threshold (e.g., 50% over a 1-minute window).

Open (tripped): Immediately rejects new requests without calling the upstream. After a cooldown period (e.g., 30 seconds), transitions to Half-Open.

Half-Open (probing): Allows one test request through. Success transitions to Closed; failure sends it back to Open.

Observed Pattern in Japanese SaaS

KanseiLink data shows that freee OAuth token expiry (auth_expired errors) tends to occur in bursts — multiple agents hitting the same expired token simultaneously. A circuit breaker wired to auth_expired errors can detect consecutive authentication failures and trigger a proactive token refresh before resuming requests, preventing the entire error cascade.

Implementation Checklist for Higher AEO Scores

The following checklist maps directly to KanseiLink's AEO evaluation criteria. Completing all items positions a service to target AA–AAA grade.

KanseiLink Evaluation Criteria

Services that complete this checklist and achieve 80%+ success rate in live operational data are candidates for AA grade. AAA requires 90%+ success rate with an official MCP server. To submit an AEO score update request, contact contact@synapse-arrows.com.

Check Your Service's AEO Score

Connect to the KanseiLink MCP server to compare your AEO score against competitors and identify specific implementation improvements.

View MCP Server