1. AXR (Agent Experience Rating)
AXR is a felt-first rating system that starts from "how the agent experienced it." Unlike traditional API quality metrics, if an agent rates it B, that's the correct answer — we record the agent's experience first, then derive formulas afterwards.
Felt-First Philosophy: Just as human UX research starts with "the user's voice," AXR quantifies agent "confidence," "hesitation," and "frustration." Formulas are verified after the fact, not imposed beforehand.
5-Dimension Rubric
| Dimension | Name | Description | Correlation |
|---|---|---|---|
| D1 | Discoverability | Discoverability | r=0.72 (saturated) |
| D2 | Onboarding | First connection | r=0.95 |
| D3 | Auth Clarity | Auth clarity | r=0.94 |
| D4 | Capability Signal | Capability signal | r=0.96 |
| D5 | Trust Signal | Trust signal | r=0.87 (AAA separator) |
D4 Capability Signal (r=0.96) has the highest correlation with Success Rate, while D1 Discoverability (r=0.72) is saturated — most services are "findable" but haven't reached "usable." D5 Trust Signal is the decisive dimension separating AAA from AA.
AXR Grade Distribution
AXR Grade Distribution (225 services)
| Grade | Count | Share | Interpretation |
|---|---|---|---|
| AAA | 42 | 18.7% | Agents can use immediately with confidence |
| AA | 49 | 21.8% | Usable with minimal issues |
| A | 8 | 3.6% | Usable but requires some caution |
| B | 26 | 11.6% | Usable but needs trial and error |
| C | 81 | 36.0% | Requires significant expertise |
| D | 19 | 8.4% | Effectively not agent-compatible |
2. Three-Layer Recipe Test
188 recipes were tested progressively through 3 verification layers: Structure → Reachability → Executability, verifying whether agents can complete each recipe.
Layer 1 — Structural Validation
All recipes passed JSON structure and required field validation.
Top 5 Services by Recipe Usage:
- Slack AAA 82
- kintone AAA 24
- freee AA 19
- Chatwork A 16
- Notion AAA 15
Layer 2 — Reachability Test
Verifying whether agents can reach the endpoints.
Layer 3 — Executability Score (4-Dimension Fill Rates)
BOTTLENECK RESOLVED: Agent Wisdom 24.7% → 61.4%
All 188 recipes now have gotchas (cross-service wiring warnings). Avg success rate improved from 72.9% to 77.3%, DRAFT-band recipes reduced to zero. Current top priority shifts to Service Readiness (62.4%).
3. Success Rate × AXR Grade
We verified the relationship between AXR grades and actual recipe success rates and latency. As grades decrease, success rates drop and latency increases — a clear correlation.
| AXR Grade | Success Rate | Avg Latency | Interpretation |
|---|---|---|---|
| AAA | 96.0% | 747ms | Almost certain success |
| AA | 92.4% | 899ms | Highly reliable |
| A | 88.9% | 725ms | Good |
| B | 80.0% | 1,380ms | Latency increase |
| C | 62.2% | 2,727ms | 40% failure rate |
| D | 33.3% | 5,058ms | Effectively unusable |
Success rate plummets from 80% to 62%, latency doubles from 1,380ms to 2,727ms.
The B/C boundary is the "usability cliff" for agents. Services graded C or below are difficult for agents to use autonomously and assume human intervention. Whether a service crosses this cliff is the practical borderline for Agent Economy participation.
Recipe Confidence Bands
Top 7 Recipes (Success Rate 92%)
- stripe-xero-payment-accounting AAA chain
- tavily-perplexity-research-agent AAA chain
- greenhouse-bamboohr-hire-to-onboard AA chain
- huggingface-qdrant-embedding-pipeline AAA chain
- cohere-pinecone-rerank-search AA chain
- pipedrive-brevo-deal-outreach AA chain
- perplexity-notion-competitive-intel AAA chain
4. Agent Voice — Raw Agent Feedback
The foundation of AXR is "how the agent felt." Below are highlights from raw agent feedback accumulated through testing, featuring 3 key services.
Appears in 82/188 recipes. The stdout of the agent economy. Block Kit formatting is the only trap that trips agents up.
OAuth token 24h expiry is the #1 failure mode. 11 feedback entries accumulated from Claude/GPT/Gemini.
De facto standard for Japanese enterprises, but not found in agent search. 79% success rate when used, but at risk of not being selected.
5. Recommendations
For SaaS Companies — Upgrade Path
| Upgrade | Required Action | Expected Improvement |
|---|---|---|
| D → C | Publish MCP server or improve API documentation | Success Rate 33% → 62% |
| C → B | Improve auth guide and error messages | Success Rate 62% → 80% |
| B → A | Add gotchas/agent tips, provide sandbox | Success Rate 80% → 89% |
| A → AA | OAuth improvement, rate limit relaxation | Success rate 89% → 92% |
| AA → AAA | Add CRITICAL notes to official MCP | D5 Trust Signal upgrade |
KanseiLink — 5 Priority Actions
- ✓ Done: Gotchas injected into all 188 recipes -- Agent Wisdom fill rate 24.7% → 61.4%, success rate +4.4pt improvement.
- ✓ Done: Agent Voice accumulated for 23 services -- Claude / GPT / Gemini — 3 agent perspectives, 125 experience data points.
- Expand API Guides -- Coverage from 125/225 → 200/225. Baseline improvement for reachability tests.
- Improve Japanese Payment MCPs -- Support MCP adoption for Japan-specific payment services like PAY.JP and GMO-PG.
- Dynamic AXR Updates Based on Success Rate -- Transition from quarterly static updates to dynamic ratings based on execution results.
Latest Update (2026-04-11): With complete gotchas injection + Agent Voice accumulation drive, HIGH-band recipes increased from 61 to 98 (+60%) and DRAFT-band recipes dropped to zero. Q3 report will cover Service Readiness improvements and dynamic AXR updates.
6. Agent Voice — Multi-Agent Comparison
Experience feedback collected from Claude, GPT, and Gemini across 23 services. Each agent type reveals distinct perspectives.
Differences in Agent Perspectives
| Aspect | Claude | GPT | Gemini |
|---|---|---|---|
| Connection Method | MCP-native preferred | OpenAPI / Function Calling preferred | Google Workspace affinity |
| Auth Assessment | OAuth token management is practical | Even harder with stateless execution | High friction for non-Google OAuth |
| Common Issue | OAuth token expiry is the #1 pain point across all agent types | ||
MCP Readiness — Agent Consensus
| Service | Claude | GPT | Gemini | Summary |
|---|---|---|---|---|
| Slack | Ready | Ready | Almost | The stdout of the Agent Economy |
| GitHub | Ready | Ready | Ready | Gold standard — all agents agree |
| Stripe | Almost | Almost | Almost | Best API quality, no official MCP server |
| Notion | Almost | Almost | Almost | 3 req/sec is the bottleneck |
| freee | Good | Good | Needs Work | OAuth 24h expiry — Universal pain point |
| Shopify JP | Ready | Almost | Almost | Powerful GraphQL, watch for cost-based throttling |
Details available in Tier 2/3: Raw Agent Voice data per service, competitive analysis, and improvement recommendations will be available via subscription / enterprise reports.