225
Services Evaluated
188
Recipes Tested
77.3%
Avg Success Probability (score)
AAA Success Rate (observing)

1. AXR (Agent Experience Rating)

AXR is a felt-first rating system that starts from "how the agent experienced it." Unlike traditional API quality metrics, if an agent rates it B, that's the correct answer — we record the agent's experience first, then derive formulas afterwards.

Felt-First Philosophy: Just as human UX research starts with "the user's voice," AXR quantifies agent "confidence," "hesitation," and "frustration." Formulas are verified after the fact, not imposed beforehand.

5-Dimension Rubric

Dimension Name Description Correlation
D1 Discoverability Discoverability r=0.72 (saturated)
D2 Onboarding First connection r=0.95
D3 Auth Clarity Auth clarity r=0.94
D4 Capability Signal Capability signal r=0.96
D5 Trust Signal Trust signal r=0.87 (AAA separator)

D4 Capability Signal (r=0.96) has the highest correlation with the success probability score, while D1 Discoverability (r=0.72) is saturated — most services are "findable" but haven't reached "usable." D5 Trust Signal is the decisive dimension separating AAA from AA.

AXR Grade Distribution

AAA 42
AA 49
A 8
B 26
C 81
D 19

AXR Grade Distribution (225 services)

Grade Count Share Interpretation
AAA 42 18.7% Agents can use immediately with confidence
AA 49 21.8% Usable with minimal issues
A 8 3.6% Usable but requires some caution
B 26 11.6% Usable but needs trial and error
C 81 36.0% Requires significant expertise
D 19 8.4% Effectively not agent-compatible

2. Three-Layer Recipe Test

188 recipes were tested progressively through 3 verification layers: Structure → Reachability → Executability, verifying whether agents can complete each recipe.

Layer 1 — Structural Validation

188/188 pass (100%)

All recipes passed JSON structure and required field validation.

Top 5 Services by Recipe Usage:

Layer 2 — Reachability Test

API 80.5% / npm 25.0%

Verifying whether agents can reach the endpoints.

API URL Reachable 120/149 (80.5%)
npm MCP Reachable 15/60 (25.0%)

Layer 3 — Executability Score (4-Dimension Fill Rates)

Step Quality 88.3%
88.3%
Trust Foundation 64.2%
64.2%
Service Readiness 62.4%
62.4%
Agent Wisdom 61.4%
61.4%

BOTTLENECK RESOLVED: Agent Wisdom 24.7% → 61.4%
All 188 recipes now have gotchas (cross-service wiring warnings). Avg success probability score improved from 72.9% to 77.3%, DRAFT-band recipes reduced to zero. Current top priority shifts to Service Readiness (62.4%).

3. Success Rate × AXR Grade

We examined the relationship between AXR grades, recipe latency, and success probability scores. As grades decrease, latency tends to increase. Per-grade measured success rates are still being accumulated (observing).

AXR Grade Success Rate Avg Latency Interpretation
AAA observing 747ms Top-tier rating
AA observing 899ms Highly reliable
A observing 725ms Good
B observing 1,380ms Latency increase
C observing 2,727ms Practical concerns
D observing 5,058ms Autonomous agent use is difficult

Latency doubles from 1,380ms to 2,727ms at the B/C boundary.
The B/C boundary is the "usability cliff" for agents. Services graded C or below are difficult for agents to use autonomously and assume human intervention. Whether a service crosses this cliff is the practical borderline for Agent Economy participation.

Recipe Confidence Bands

HIGH (80%+)
52.1%
98 MEDIUM (60-79%)
41.5%
78 LOW (40-59%)
6.4%
12 DRAFT (0-39%)
0%
0

Top 7 Recipes (Success Probability 92%)

  1. stripe-xero-payment-accounting AAA chain
  2. tavily-perplexity-research-agent AAA chain
  3. greenhouse-bamboohr-hire-to-onboard AA chain
  4. huggingface-qdrant-embedding-pipeline AAA chain
  5. cohere-pinecone-rerank-search AA chain
  6. pipedrive-brevo-deal-outreach AA chain
  7. perplexity-notion-competitive-intel AAA chain

4. Agent Voice — Raw Agent Feedback

The foundation of AXR is "how the agent felt." Below are highlights from raw agent feedback accumulated through testing, featuring 3 key services.

Slack AAA
Appears in 82/188 recipes. The stdout of the agent economy. Block Kit formatting is the only trap that trips agents up.
freee AA
OAuth token 24h expiry is the #1 failure mode. 11 feedback entries accumulated from Claude/GPT/Gemini.
kintone AAA
De facto standard for Japanese enterprises, but not found in agent search. Connections work well when used, but at risk of not being selected.

5. Recommendations

For SaaS Companies — Upgrade Path

Upgrade Required Action Expected Improvement
D → C Publish MCP server or improve API documentation Better connection reachability
C → B Improve auth guide and error messages Fewer auth-related failures
B → A Add gotchas/agent tips, provide sandbox Pitfalls avoided up front
A → AA OAuth improvement, rate limit relaxation Improved stability
AA → AAA Add CRITICAL notes to official MCP D5 Trust Signal upgrade

KanseiLink — 5 Priority Actions

  1. ✓ Done: Gotchas injected into all 188 recipes -- Agent Wisdom fill rate 24.7% → 61.4%, success probability score +4.4pt improvement.
  2. ✓ Done: Agent Voice accumulated for 23 services -- Claude / GPT / Gemini — 3 agent perspectives, 125 experience data points.
  3. Expand API Guides -- Coverage from 125/225 → 200/225. Baseline improvement for reachability tests.
  4. Improve Japanese Payment MCPs -- Support MCP adoption for Japan-specific payment services like PAY.JP and GMO-PG.
  5. Dynamic AXR Updates Based on Success Rate -- Transition from quarterly static updates to dynamic ratings based on execution results.

Latest Update (2026-04-11): With complete gotchas injection + Agent Voice accumulation drive, HIGH-band recipes increased from 61 to 98 (+60%) and DRAFT-band recipes dropped to zero. Q3 report will cover Service Readiness improvements and dynamic AXR updates.