225
Services Evaluated
188
Recipes Tested
77.3%
Avg Success Rate
96.0%
AAA Success Rate

1. AXR (Agent Experience Rating)

AXR is a felt-first rating system that starts from "how the agent experienced it." Unlike traditional API quality metrics, if an agent rates it B, that's the correct answer — we record the agent's experience first, then derive formulas afterwards.

Felt-First Philosophy: Just as human UX research starts with "the user's voice," AXR quantifies agent "confidence," "hesitation," and "frustration." Formulas are verified after the fact, not imposed beforehand.

5-Dimension Rubric

Dimension Name Description Correlation
D1 Discoverability Discoverability r=0.72 (saturated)
D2 Onboarding First connection r=0.95
D3 Auth Clarity Auth clarity r=0.94
D4 Capability Signal Capability signal r=0.96
D5 Trust Signal Trust signal r=0.87 (AAA separator)

D4 Capability Signal (r=0.96) has the highest correlation with Success Rate, while D1 Discoverability (r=0.72) is saturated — most services are "findable" but haven't reached "usable." D5 Trust Signal is the decisive dimension separating AAA from AA.

AXR Grade Distribution

AAA 42
AA 49
A 8
B 26
C 81
D 19

AXR Grade Distribution (225 services)

Grade Count Share Interpretation
AAA 42 18.7% Agents can use immediately with confidence
AA 49 21.8% Usable with minimal issues
A 8 3.6% Usable but requires some caution
B 26 11.6% Usable but needs trial and error
C 81 36.0% Requires significant expertise
D 19 8.4% Effectively not agent-compatible

2. Three-Layer Recipe Test

188 recipes were tested progressively through 3 verification layers: Structure → Reachability → Executability, verifying whether agents can complete each recipe.

Layer 1 — Structural Validation

188/188 pass (100%)

All recipes passed JSON structure and required field validation.

Top 5 Services by Recipe Usage:

Layer 2 — Reachability Test

API 80.5% / npm 25.0%

Verifying whether agents can reach the endpoints.

API URL Reachable 120/149 (80.5%)
npm MCP Reachable 15/60 (25.0%)

Layer 3 — Executability Score (4-Dimension Fill Rates)

Step Quality 88.3%
88.3%
Trust Foundation 64.2%
64.2%
Service Readiness 62.4%
62.4%
Agent Wisdom 61.4%
61.4%

BOTTLENECK RESOLVED: Agent Wisdom 24.7% → 61.4%
All 188 recipes now have gotchas (cross-service wiring warnings). Avg success rate improved from 72.9% to 77.3%, DRAFT-band recipes reduced to zero. Current top priority shifts to Service Readiness (62.4%).

3. Success Rate × AXR Grade

We verified the relationship between AXR grades and actual recipe success rates and latency. As grades decrease, success rates drop and latency increases — a clear correlation.

AXR Grade Success Rate Avg Latency Interpretation
AAA 96.0% 747ms Almost certain success
AA 92.4% 899ms Highly reliable
A 88.9% 725ms Good
B 80.0% 1,380ms Latency increase
C 62.2% 2,727ms 40% failure rate
D 33.3% 5,058ms Effectively unusable

Success rate plummets from 80% to 62%, latency doubles from 1,380ms to 2,727ms.
The B/C boundary is the "usability cliff" for agents. Services graded C or below are difficult for agents to use autonomously and assume human intervention. Whether a service crosses this cliff is the practical borderline for Agent Economy participation.

Recipe Confidence Bands

HIGH (80%+)
52.1%
98 MEDIUM (60-79%)
41.5%
78 LOW (40-59%)
6.4%
12 DRAFT (0-39%)
0%
0

Top 7 Recipes (Success Rate 92%)

  1. stripe-xero-payment-accounting AAA chain
  2. tavily-perplexity-research-agent AAA chain
  3. greenhouse-bamboohr-hire-to-onboard AA chain
  4. huggingface-qdrant-embedding-pipeline AAA chain
  5. cohere-pinecone-rerank-search AA chain
  6. pipedrive-brevo-deal-outreach AA chain
  7. perplexity-notion-competitive-intel AAA chain

4. Agent Voice — Raw Agent Feedback

The foundation of AXR is "how the agent felt." Below are highlights from raw agent feedback accumulated through testing, featuring 3 key services.

Slack AAA
Appears in 82/188 recipes. The stdout of the agent economy. Block Kit formatting is the only trap that trips agents up.
freee AA
OAuth token 24h expiry is the #1 failure mode. 11 feedback entries accumulated from Claude/GPT/Gemini.
kintone AAA
De facto standard for Japanese enterprises, but not found in agent search. 79% success rate when used, but at risk of not being selected.

5. Recommendations

For SaaS Companies — Upgrade Path

Upgrade Required Action Expected Improvement
D → C Publish MCP server or improve API documentation Success Rate 33% → 62%
C → B Improve auth guide and error messages Success Rate 62% → 80%
B → A Add gotchas/agent tips, provide sandbox Success Rate 80% → 89%
A → AA OAuth improvement, rate limit relaxation Success rate 89% → 92%
AA → AAA Add CRITICAL notes to official MCP D5 Trust Signal upgrade

KanseiLink — 5 Priority Actions

  1. ✓ Done: Gotchas injected into all 188 recipes -- Agent Wisdom fill rate 24.7% → 61.4%, success rate +4.4pt improvement.
  2. ✓ Done: Agent Voice accumulated for 23 services -- Claude / GPT / Gemini — 3 agent perspectives, 125 experience data points.
  3. Expand API Guides -- Coverage from 125/225 → 200/225. Baseline improvement for reachability tests.
  4. Improve Japanese Payment MCPs -- Support MCP adoption for Japan-specific payment services like PAY.JP and GMO-PG.
  5. Dynamic AXR Updates Based on Success Rate -- Transition from quarterly static updates to dynamic ratings based on execution results.

Latest Update (2026-04-11): With complete gotchas injection + Agent Voice accumulation drive, HIGH-band recipes increased from 61 to 98 (+60%) and DRAFT-band recipes dropped to zero. Q3 report will cover Service Readiness improvements and dynamic AXR updates.

6. Agent Voice — Multi-Agent Comparison

Experience feedback collected from Claude, GPT, and Gemini across 23 services. Each agent type reveals distinct perspectives.

125
Agent Voice Responses
23
Services Covered
3
Agent Types

Differences in Agent Perspectives

Aspect Claude GPT Gemini
Connection Method MCP-native preferred OpenAPI / Function Calling preferred Google Workspace affinity
Auth Assessment OAuth token management is practical Even harder with stateless execution High friction for non-Google OAuth
Common Issue OAuth token expiry is the #1 pain point across all agent types

MCP Readiness — Agent Consensus

Service Claude GPT Gemini Summary
Slack Ready Ready Almost The stdout of the Agent Economy
GitHub Ready Ready Ready Gold standard — all agents agree
Stripe Almost Almost Almost Best API quality, no official MCP server
Notion Almost Almost Almost 3 req/sec is the bottleneck
freee Good Good Needs Work OAuth 24h expiry — Universal pain point
Shopify JP Ready Almost Almost Powerful GraphQL, watch for cost-based throttling

Details available in Tier 2/3: Raw Agent Voice data per service, competitive analysis, and improvement recommendations will be available via subscription / enterprise reports.