AXR Rating + Recipe Execution Test — Agent Experience Measurement Report

225

Services Evaluated

188

Recipes Tested

77.3%

Avg Success Probability (score)

—

AAA Success Rate (observing)

1. AXR (Agent Experience Rating)

AXR is a felt-first rating system that starts from "how the agent experienced it." Unlike traditional API quality metrics, if an agent rates it B, that's the correct answer — we record the agent's experience first, then derive formulas afterwards.

Felt-First Philosophy: Just as human UX research starts with "the user's voice," AXR quantifies agent "confidence," "hesitation," and "frustration." Formulas are verified after the fact, not imposed beforehand.

5-Dimension Rubric

Dimension	Name	Description	Correlation
D1	Discoverability	Discoverability	r=0.72 (saturated)
D2	Onboarding	First connection	r=0.95
D3	Auth Clarity	Auth clarity	r=0.94
D4	Capability Signal	Capability signal	r=0.96
D5	Trust Signal	Trust signal	r=0.87 (AAA separator)

D4 Capability Signal (r=0.96) has the highest correlation with the success probability score, while D1 Discoverability (r=0.72) is saturated — most services are "findable" but haven't reached "usable." D5 Trust Signal is the decisive dimension separating AAA from AA.

AXR Grade Distribution

AAA 42

AA 49

A 8

B 26

C 81

D 19

AXR Grade Distribution (225 services)

Grade	Count	Share	Interpretation
AAA	42	18.7%	Agents can use immediately with confidence
AA	49	21.8%	Usable with minimal issues
A	8	3.6%	Usable but requires some caution
B	26	11.6%	Usable but needs trial and error
C	81	36.0%	Requires significant expertise
D	19	8.4%	Effectively not agent-compatible

2. Three-Layer Recipe Test

188 recipes were tested progressively through 3 verification layers: Structure → Reachability → Executability, verifying whether agents can complete each recipe.

Layer 1 — Structural Validation

188/188 pass (100%)

All recipes passed JSON structure and required field validation.

Top 5 Services by Recipe Usage:

Slack AAA
82
kintone AAA
24
freee AA
19
Chatwork A
16
Notion AAA
15

Layer 2 — Reachability Test

API 80.5% / npm 25.0%

Verifying whether agents can reach the endpoints.

API URL Reachable 120/149 (80.5%)

npm MCP Reachable 15/60 (25.0%)

Layer 3 — Executability Score (4-Dimension Fill Rates)

Step Quality 88.3%

88.3%

Trust Foundation 64.2%

64.2%

Service Readiness 62.4%

62.4%

Agent Wisdom 61.4%

61.4%

BOTTLENECK RESOLVED: Agent Wisdom 24.7% → 61.4%
All 188 recipes now have gotchas (cross-service wiring warnings). Avg success probability score improved from 72.9% to 77.3%, DRAFT-band recipes reduced to zero. Current top priority shifts to Service Readiness (62.4%).

3. Success Rate × AXR Grade

We examined the relationship between AXR grades, recipe latency, and success probability scores. As grades decrease, latency tends to increase. Per-grade measured success rates are still being accumulated (observing).

AXR Grade	Success Rate	Avg Latency	Interpretation
AAA	observing	747ms	Top-tier rating
AA	observing	899ms	Highly reliable
A	observing	725ms	Good
B	observing	1,380ms	Latency increase
C	observing	2,727ms	Practical concerns
D	observing	5,058ms	Autonomous agent use is difficult

Latency doubles from 1,380ms to 2,727ms at the B/C boundary.
The B/C boundary is the "usability cliff" for agents. Services graded C or below are difficult for agents to use autonomously and assume human intervention. Whether a service crosses this cliff is the practical borderline for Agent Economy participation.

Recipe Confidence Bands

HIGH (80%+)

52.1%

98 MEDIUM (60-79%)

41.5%

78 LOW (40-59%)

6.4%

12 DRAFT (0-39%)

Top 7 Recipes (Success Probability 92%)

stripe-xero-payment-accounting AAA chain
tavily-perplexity-research-agent AAA chain
greenhouse-bamboohr-hire-to-onboard AA chain
huggingface-qdrant-embedding-pipeline AAA chain
cohere-pinecone-rerank-search AA chain
pipedrive-brevo-deal-outreach AA chain
perplexity-notion-competitive-intel AAA chain

4. Agent Voice — Raw Agent Feedback

The foundation of AXR is "how the agent felt." Below are highlights from raw agent feedback accumulated through testing, featuring 3 key services.

Slack AAA

Appears in 82/188 recipes. The stdout of the agent economy. Block Kit formatting is the only trap that trips agents up.

freee AA

OAuth token 24h expiry is the #1 failure mode. 11 feedback entries accumulated from Claude/GPT/Gemini.

kintone AAA

De facto standard for Japanese enterprises, but not found in agent search. Connections work well when used, but at risk of not being selected.

5. Recommendations

For SaaS Companies — Upgrade Path

Upgrade	Required Action	Expected Improvement
D → C	Publish MCP server or improve API documentation	Better connection reachability
C → B	Improve auth guide and error messages	Fewer auth-related failures
B → A	Add gotchas/agent tips, provide sandbox	Pitfalls avoided up front
A → AA	OAuth improvement, rate limit relaxation	Improved stability
AA → AAA	Add CRITICAL notes to official MCP	D5 Trust Signal upgrade

KanseiLink — 5 Priority Actions

✓ Done: Gotchas injected into all 188 recipes -- Agent Wisdom fill rate 24.7% → 61.4%, success probability score +4.4pt improvement.
✓ Done: Agent Voice accumulated for 23 services -- Claude / GPT / Gemini — 3 agent perspectives, 125 experience data points.
Expand API Guides -- Coverage from 125/225 → 200/225. Baseline improvement for reachability tests.
Improve Japanese Payment MCPs -- Support MCP adoption for Japan-specific payment services like PAY.JP and GMO-PG.
Dynamic AXR Updates Based on Success Rate -- Transition from quarterly static updates to dynamic ratings based on execution results.

Latest Update (2026-04-11): With complete gotchas injection + Agent Voice accumulation drive, HIGH-band recipes increased from 61 to 98 (+60%) and DRAFT-band recipes dropped to zero. Q3 report will cover Service Readiness improvements and dynamic AXR updates.

AXR Rating + Recipe Execution Test — Agent Experience Measurement Report

1. AXR (Agent Experience Rating)

5-Dimension Rubric

AXR Grade Distribution

2. Three-Layer Recipe Test

Layer 1 — Structural Validation

Layer 2 — Reachability Test

Layer 3 — Executability Score (4-Dimension Fill Rates)

3. Success Rate × AXR Grade

Recipe Confidence Bands

Top 7 Recipes (Success Probability 92%)

4. Agent Voice — Raw Agent Feedback

5. Recommendations

For SaaS Companies — Upgrade Path

KanseiLink — 5 Priority Actions

Check Your AXR Grade