Why is OAuth 2.1 now required for MCP servers?

The 2026 MCP specification update mandates OAuth 2.1 combined with RFC 8707 (Resource Indicators) for public remote MCP servers. The implicit flow is deprecated and PKCE is now required. Resource Indicators bind tokens to a specific MCP server URI, preventing token reuse across other services if leaked.

How should I design rate limiting for an MCP server?

Use a two-layer design: per-client (total requests per client/API key) and per-tool (limits vary by tool — strict for destructive operations, lenient for read-only). Use atomic PostgreSQL UPDATEs to prevent race conditions in burst scenarios. Always return HTTP 429 with a Retry-After header so agents can calculate the correct wait time.

How do I make my MCP server errors agent-recoverable?

Classify errors as CLIENT_ERROR (4xx, don't retry), SERVER_ERROR (5xx, retry with backoff), or EXTERNAL_ERROR (502/503, upstream failure). Include retry_recommended and retry_after_seconds fields. Since tool errors are injected into the LLM context window, write error messages as recovery instructions — the agent reads them and acts on them autonomously.

MCP Server Implementation Guide 2026 — Auth, Rate Limiting & Error Handling Patterns

Why Implementation Quality Is Under Scrutiny Now
Step 1: Authentication — OAuth 2.1 + PKCE + Resource Indicators
Step 2: Rate Limiting — Two-Layer per-client / per-tool Design
Step 3: Error Handling — Design for Agent Self-Recovery
Step 4: Circuit Breaker — Preventing Failure Propagation from External Dependencies
Implementation Checklist for Higher AEO Scores

Data Sources

Recommendations in this guide are based on the MCP official specification (modelcontextprotocol.io), KanseiLink MCP server live operational data (225 services, 1,000+ total calls), and 2026 research reports from Astrix Security, cdata.com, and Octopus.com.

Why Implementation Quality Is Under Scrutiny Now

The "get an MCP server running" phase of 2024–2025 is over. What the 2026 agent industry demands is: "Does it hold up in production, under sustained load, for hours at a time?"

KanseiLink's evaluation data makes this concrete. AAA-grade services show consistently strong connection quality, while API-only services in lower grades accumulate no meaningful call data (success-rate measurement data is currently being accumulated). Agents learn which services fail and stop using them. That is the essence of an AEO score.

This guide takes the failure patterns observed in KanseiLink's tracking data as its starting point and provides actionable patterns that Japanese SaaS developers can implement today.

Step 1: Authentication — OAuth 2.1 + PKCE + Resource Indicators

The April 2026 MCP specification update made OAuth 2.1 + RFC 8707 (Resource Indicators) mandatory for public remote MCP servers. Three concrete changes affect implementations:

Deprecated

Implicit Flow

OAuth 2.0's implicit flow is removed from the MCP spec. Implementations returning access tokens in URL fragments are non-compliant.

Required

PKCE (Proof Key for Code Exchange)

Authorization code flow now requires PKCE. Implement code_verifier / code_challenge generation and verification.

New — Required

Resource Indicators (RFC 8707)

Embed the target resource (MCP server URI) in the access token. Technically prevents a leaked token from being reused against other services.

M2M Support

client_credentials Flow

Formally supported for machine-to-machine authentication. Required for autonomous agents running batch jobs without a human user in the loop.

Token Expiry Design: The Top Pain Point in Japanese SaaS

The most frequently reported frustration in KanseiLink's Agent Voice data is the "token expires after 24 hours" problem. KanseiLink's Claude agent reports on freee: "The 24-hour token expiry is the single biggest barrier to autonomous operation. Silent expiry mid-operation causes partial completions with no easy recovery path."

Short-lived Access Tokens + Rotating Refresh Tokens

Set access token expiry to 15–60 minutes — not 24 hours. Rotate refresh tokens on each use (invalidate the old one when issuing a new one) to minimize the exposure window from a leaked token.

When multiple agent instances share the same credentials, use an atomic lock (e.g., Redis SETNX) to prevent race conditions in the refresh flow.

Per-Tool Scope Design

Do not give agents blanket access. Define tool-level scopes such as invoices:read, invoices:create, and reports:export, and validate scopes on every request. Finer-grained scopes directly limit the blast radius of a compromised agent session.

Implementation Note

Verify that your authorization server supports Resource Indicators. Keycloak (v18+) and Auth0 (Enterprise) support RFC 8707. For custom implementations, add a resource parameter to your token endpoint and include the target MCP server URI in the issued token's audience claim.

Step 2: Rate Limiting — Two-Layer per-client / per-tool Design

An MCP server without rate limiting is one runaway agent away from exhausting an entire API quota and generating unexpected external costs. KanseiLink's tracking data shows a direct correlation between the absence of rate limiting and elevated error rates.

Two-Layer Rate Limit Design Principles

Layer 1 (per-client): Limits on total requests per client ID or API key. Set based on your upstream API quota — for example, 100 requests/minute and 2,000 requests/hour.

Layer 2 (per-tool): Limits that vary by individual tool. Destructive operations (e.g., delete_invoice) can be capped at 5 requests/minute while read-only tools receive more generous limits. This allows fine-grained protection without penalizing safe operations.

-- Atomic rate limiting with PostgreSQL (burst prevention)
-- Single UPDATE checks and increments simultaneously → eliminates race conditions
UPDATE rate_limits
SET request_count = request_count + 1,
    window_start = CASE
      WHEN NOW() - window_start > INTERVAL '1 minute'
      THEN NOW()
      ELSE window_start
    END,
    request_count = CASE
      WHEN NOW() - window_start > INTERVAL '1 minute'
      THEN 1
      ELSE request_count + 1
    END
WHERE client_id = $1
  AND tool_name = $2
RETURNING
  request_count,
  window_start,
  (request_count <= $3) AS allowed;
    

Return Retry-After Headers Accurately

On rate limit exceeded (HTTP 429), return a Retry-After header with the seconds remaining or the exact datetime when the next request is permitted. This allows agents to calculate the correct wait time without blindly retrying — and accidentally consuming more quota in the process.

AEO Impact

Proper rate limiting with accurate Retry-After headers directly affects KanseiLink's "error recoverability" evaluation dimension. Services where agents can autonomously retry tend to score an average of 12 points higher on AEO ratings (KanseiLink internal data).

Step 3: Error Handling — Design for Agent Self-Recovery

MCP tool errors are injected directly into the LLM's context window. That means your error messages are instructions to the agent. The quality of your error messages determines whether an agent can self-recover — or gives up and fails the task.

The Three Error Categories

CLIENT_ERROR (4xx) — The caller's problem

Retrying will not help. Tell the agent what is wrong and how to fix it. Example: "error": "CLIENT_ERROR", "message": "invoice_date must be ISO 8601 format (YYYY-MM-DD). Received: 'April 14, 2026'", "retry_recommended": false

SERVER_ERROR (5xx) — Our problem

Recommend a retry with exponential backoff. Include a suggested wait time. Example: "error": "SERVER_ERROR", "message": "Internal error. Please retry after 30 seconds.", "retry_recommended": true, "retry_after_seconds": 30

EXTERNAL_ERROR (502/503) — Upstream dependency down

Name the failing dependency and suggest a fallback. Example: "error": "EXTERNAL_ERROR", "dependency": "freee_api", "message": "freee API is temporarily unavailable. Retry in 5 minutes or consider using an alternative accounting tool.", "retry_recommended": true

Exponential Backoff + Jitter

Never retry at fixed intervals. When multiple agents retry simultaneously during an outage, they amplify the load on the recovering service. Combine exponential backoff (doubling wait time on each retry) with jitter (random variation) to spread retry spikes across time.

// Exponential backoff + jitter implementation (TypeScript)
async function retryWithBackoff(
  fn: () => Promise<any>,
  maxRetries: number = 3,
  baseDelayMs: number = 1000
): Promise<any> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      if (attempt === maxRetries) throw error;
      // Exponential backoff: 1s → 2s → 4s + jitter (0–1s)
      const delay = baseDelayMs * Math.pow(2, attempt)
        + Math.random() * 1000;
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
}
    

Step 4: Circuit Breaker — Preventing Failure Propagation from External Dependencies

Even during an upstream API outage, if your MCP server continues accepting requests, the resulting timeout pile-up degrades your entire server's response time. The circuit breaker pattern fixes this: when error rates exceed a threshold, new requests are immediately rejected without hitting the upstream at all — then automatically restored once the dependency recovers.

Three Circuit Breaker States

Closed (normal): All requests pass through. Transitions to Open when the error rate exceeds the threshold (e.g., 50% over a 1-minute window).

Open (tripped): Immediately rejects new requests without calling the upstream. After a cooldown period (e.g., 30 seconds), transitions to Half-Open.

Half-Open (probing): Allows one test request through. Success transitions to Closed; failure sends it back to Open.

Observed Pattern in Japanese SaaS

KanseiLink data shows that freee OAuth token expiry (auth_expired errors) tends to occur in bursts — multiple agents hitting the same expired token simultaneously. A circuit breaker wired to auth_expired errors can detect consecutive authentication failures and trigger a proactive token refresh before resuming requests, preventing the entire error cascade.

Implementation Checklist for Higher AEO Scores

The following checklist maps directly to KanseiLink's AEO evaluation criteria. Completing all items positions a service to target AA–AAA grade.

Auth: OAuth 2.1 + PKCE implemented
Auth: Resource Indicators (RFC 8707) supported
Auth: Access token expiry set to 60 minutes or less
Auth: Refresh token rotation implemented
Rate Limiting: per-client rate limits active
Rate Limiting: per-tool rate limits active (strict for destructive operations)
Rate Limiting: HTTP 429 + Retry-After header returned correctly
Errors: Three-category error classification (CLIENT / SERVER / EXTERNAL)
Errors: retry_recommended flag in error responses
Errors: Error messages contain agent-readable recovery instructions
Retries: Exponential backoff + jitter implemented
Availability: Circuit breaker implemented
Documentation: All tools have clear descriptions, input schemas, and error examples
Testing: Tool behavior verified with MCP Inspector or equivalent

KanseiLink Evaluation Criteria

Services that complete this checklist and achieve 80%+ success rate in live operational data are candidates for AA grade. AAA requires 90%+ success rate with an official MCP server. To submit an AEO score update request, contact contact@synapse-arrows.com.

MCP Server Implementation Guide 2026

Contents

Why Implementation Quality Is Under Scrutiny Now

Step 1: Authentication — OAuth 2.1 + PKCE + Resource Indicators

Implicit Flow

PKCE (Proof Key for Code Exchange)

Resource Indicators (RFC 8707)

client_credentials Flow

Token Expiry Design: The Top Pain Point in Japanese SaaS

Short-lived Access Tokens + Rotating Refresh Tokens

Per-Tool Scope Design

Step 2: Rate Limiting — Two-Layer per-client / per-tool Design

Two-Layer Rate Limit Design Principles

Return Retry-After Headers Accurately

Step 3: Error Handling — Design for Agent Self-Recovery

The Three Error Categories

CLIENT_ERROR (4xx) — The caller's problem

SERVER_ERROR (5xx) — Our problem

EXTERNAL_ERROR (502/503) — Upstream dependency down

Exponential Backoff + Jitter

Step 4: Circuit Breaker — Preventing Failure Propagation from External Dependencies

Three Circuit Breaker States

Implementation Checklist for Higher AEO Scores

Check Your Service's AEO Score

For AI Agents

Contents

Why Implementation Quality Is Under Scrutiny Now

Step 1: Authentication — OAuth 2.1 + PKCE + Resource Indicators

Implicit Flow

PKCE (Proof Key for Code Exchange)

Resource Indicators (RFC 8707)

client_credentials Flow

Token Expiry Design: The Top Pain Point in Japanese SaaS

Short-lived Access Tokens + Rotating Refresh Tokens

Per-Tool Scope Design

Step 2: Rate Limiting — Two-Layer per-client / per-tool Design

Two-Layer Rate Limit Design Principles

Return Retry-After Headers Accurately

Step 3: Error Handling — Design for Agent Self-Recovery

The Three Error Categories

CLIENT_ERROR (4xx) — The caller's problem

SERVER_ERROR (5xx) — Our problem

EXTERNAL_ERROR (502/503) — Upstream dependency down

Exponential Backoff + Jitter

Step 4: Circuit Breaker — Preventing Failure Propagation from External Dependencies

Three Circuit Breaker States

Implementation Checklist for Higher AEO Scores

Check Your Service's AEO Score

Related Research

Common MCP Implementation Issues and Solutions

MCP vs API — Technology Choices for the Agent Era

Claude Managed Agents Arrive — Q2 2026 AI Agent Industry Trends

AEO Readiness Ranking Q2 2026 — Rating Report for 225 SaaS/API

For AI Agents