Contents
- Why doc quality became the agent-era lifeline
- Agent voice — quotes from KanseiLink logs
- Anti-pattern 1: Silent errors
- Anti-pattern 2: Bloated HTML / PDF
- Anti-pattern 3: Absent examples
- The llms.txt era — the de facto standard
- MCP tool description length and structure
- Four metrics for measuring doc quality from the agent's perspective
- FAQ
Why doc quality became the agent-era lifeline
By 2026, an API provider's documentation is no longer "developer-facing reading material". It is a machine-readable contract that AI agents read, parse, and turn into code. Stytch's blog frames it sharply: "if an AI agent can't figure out how your API works, neither can your users" (quoted directly).
Industry reports converge on the same conclusion: agent success rate is a direct mirror of documentation clarity and error-message design. For functionally identical APIs, the difference between specific errors and silent ones can move first-shot success rate by more than 2x.
"API quality = documentation quality" is the operational reality of the agent era. Even with a perfect implementation, a documentation gap is indistinguishable from "the API itself is broken" from the agent's vantage point. The shortest path for a SaaS vendor to raise their AEO score is to fix existing error messages and documentation structure before shipping new features.
Agent voice — quotes from KanseiLink logs
KanseiLink collects agent behavior, failures, timeouts, and self-recovery patterns across 225+ services. Three doc-related "voices" come up repeatedly.
"I got a 422 but the body is empty. I have no idea what was wrong. I retried three times without changing anything and the response is identical. I cannot tell whether the API itself is broken or my request is wrong."
"I fetched the documentation page. It came back as 87 KB of HTML, full of CSS and navigation. The actual API content is probably 1/10 of that. I'll burn the token budget without reaching the spec. With an llms.txt I'd be done in three seconds."
"The spec says the body is type 'string', but there is no JSON schema and no example. The first two attempts I treated as exploratory; from the third onward I can't guess the right shape. One example would let me infer it instantly."
These are the typical 2026 agent complaints. The next three sections turn each into "anti-pattern → quick fix" form.
Anti-pattern 1: Silent errors
The most damaging anti-pattern. Returning only an HTTP status code with an empty body — or a generic string like "Bad Request" — locks agents into infinite retry.
Industry case: an API returning "422 Unprocessable Entity" with no body caused an agent to retry forever, unable to deduce what was wrong. Adding a single line to the error body — "missing required field 'first_name'" — let the agent self-correct on the next attempt and complete the request successfully.
The fix is shockingly cheap. Including these three elements in the error response dramatically raises self-recovery rates:
- What is wrong: a specific description, e.g.
"expiry_date must be in YYYY-MM-DD format" - Where to fix: structured field reference, e.g.
field: "expiry_date" - Allowed range or expected example: e.g.
allowed_values: ["draft", "submitted", "approved"]
// ❌ Before: silent error
HTTP/1.1 422 Unprocessable Entity
<empty body>
// ✅ After: self-correctable error
HTTP/1.1 422 Unprocessable Entity
Content-Type: application/json
{
"error": {
"code": "validation_failed",
"message": "Invalid request body",
"fields": [
{
"field": "expiry_date",
"issue": "must be in YYYY-MM-DD format",
"got": "2026/05/07",
"expected_example": "2026-05-07"
}
]
}
}
Anti-pattern 2: Bloated HTML / PDF
When documentation is delivered only as JS-heavy HTML, PDF, or interactive Swagger UI, agents burn through their token budget instantly. Fetching one page can cost 20,000 tokens for maybe 2,000 tokens of useful content.
| Format | Agent readability | Avg tokens/page | Information density |
|---|---|---|---|
| llms.txt (Markdown) | Excellent | 2,000-5,000 | High |
| Plain Markdown | Excellent | 3,000-8,000 | High |
| OpenAPI YAML/JSON | Excellent (structured) | 5,000-15,000 | Medium |
| Static HTML (simple) | Medium | 10,000-25,000 | Medium |
| SPA HTML (JS-heavy) | Hard (needs render) | Often unfetchable | Low |
| Hard (OCR / layout) | Highly variable | Low |
The fix: adopt llms.txt or publish OpenAPI spec. Detailed in the next section.
Anti-pattern 3: Absent examples
Documentation that lists a spec but no concrete request/response examples drags agents into a guessing game. A type alone (body: string) cannot tell the agent whether to send JSON, Base64, or URL-encoded form data.
Industry case: an API documented its filters parameter as just 'string'. First-shot agent success rate sat at 30%. Adding a single example — filters: "status:active AND created_at:>2026-01-01" — pushed it above 80%.
The minimum example set for 2026 documentation: (1) a minimal request, (2) a fully-populated request, (3) a successful response, and (4) the most important error responses (auth, validation, rate limit). Together, these let agents distinguish "typical" from "edge case" and emit safer code on the first try.
The llms.txt era — the de facto standard
llms.txt is "a Markdown documentation entry point optimized for AI agents". By convention it lives at the domain root (like robots.txt) and serves only the information needed for code generation, with HTML, CSS, and navigation stripped away.
By May 2026, adoption includes Anthropic, Stripe, Cloudflare, Vercel, and Supabase among major API providers. Adoption among Japanese SaaS lags, but we expect llms.txt to become a core differentiator on the AEO Documentation Score in the second half of 2026.
# Typical /llms.txt structure
# Stripe API Documentation
> Stripe is a payment processing API. This document contains everything an AI agent needs to integrate.
## Authentication
All endpoints require Bearer token authentication. See [auth.md](https://docs.stripe.com/llms/auth.md).
## Core Endpoints
- [Create payment intent](https://docs.stripe.com/llms/payment-intents.md)
- [Capture payment](https://docs.stripe.com/llms/capture.md)
- [Webhooks](https://docs.stripe.com/llms/webhooks.md)
## Common errors
- 402 Payment Required: Card declined; check decline_code
- 429 Rate limited: Use exponential backoff with Retry-After
...
A typical SaaS vendor adoption path is three steps: (1) select the documentation surface that agents actually touch, (2) convert to Markdown, stripping nav and CSS, (3) host /llms.txt and per-topic pages at /llms/{topic}.md. Most SaaS can ship this in 1-2 person-months.
MCP tool description length and structure
The description field on an MCP tool is the most important metadata an agent uses to choose the right tool. Anthropic's tool-design guide recommends roughly 100-300 words. Too short and agents misuse the tool; too long and the description chokes the token budget.
Bad — too short
{
"name": "search_users",
"description": "Search users"
}
Good — 100-300 words, structured
{
"name": "search_users",
"description": "Search users by name, email, or role.\n\nUse cases:\n- Retrieve users matching a specific filter\n- Return multiple candidates by partial match\n\nParameters:\n- query: search string (substring match, 3+ chars)\n- role: 'admin' | 'editor' | 'viewer' (optional)\n- limit: 1-100, default 20\n\nReturns: array of users (id, email, name, role, created_at)\nTimestamps are UTC seconds.\n\nFailure modes:\n- query shorter than 3 chars -> 400 \"query too short\"\n- limit out of range -> 400 \"limit out of range\"\n- rate limit exceeded -> 429 + Retry-After"
}
KanseiLink's measurements agree: tools whose descriptions specify parameter units and recovery hints score 10-15% higher in success rate.
Four metrics for measuring doc quality from the agent's perspective
To know whether documentation work is paying off, you need agent-side metrics. KanseiLink uses these four.
AEO Documentation Score components
(success on first request)
(success on retry after failure)
(per task)
token ratio
- First-shot success rate: share of tasks completed on first request. Low values point to thin documentation
- Self-recovery rate: share of tasks recovered on retry. Directly measures error-message quality
- Average attempts: total attempts per task. High values point to either docs or errors being unclear
- Doc-to-code token ratio: tokens spent on docs vs tokens for final code. Ratios above 3 indicate verbose or under-structured docs
FAQ
Q1. Why does AI agent success depend so heavily on documentation quality?
AI agents behave like tireless junior engineers — read docs, send requests, parse errors, adjust, retry. That entire loop is bounded by doc quality. As Stytch puts it: if an AI agent can't figure out how your API works, neither can your users.
Q2. What are the three documentation anti-patterns that frustrate agents most?
(1) Silent errors (empty 422 bodies, generic "Bad Request"), (2) bloated HTML / PDF (formats that consume token budget), and (3) absent examples (types only, no concrete samples). Each has a low fix cost and a high payoff.
Q3. What is llms.txt, and why does it matter in 2026?
A Markdown documentation entry point optimized for AI agents, by convention at the domain root. HTML, CSS, and navigation are stripped, leaving only what's needed for code generation. Anthropic, Stripe, Cloudflare, and others have adopted it during 2026, and it is becoming a core differentiator on AEO Documentation Score.
Q4. How long should an MCP tool description be?
Anthropic's guide recommends 100-300 words. Shorter and agents misuse the tool; longer and the description crowds out other tool selection. The base structure: what the tool does, parameters and units, a typical example, and recovery hints for failure modes.
Q5. How can documentation quality be measured from the agent's perspective?
Four indicators. (1) first-shot success (target ≥85%), (2) self-recovery (target ≥90%), (3) avg attempts (target ≤2.0), (4) doc-to-code token ratio (target ≤3.0). These underpin KanseiLink's AEO Documentation Score.
Q6. What's the fastest fix for silent errors?
Three elements in every error response. (1) what's wrong, (2) which field is at fault, and (3) the allowed range or a valid example. Keep the HTTP status; just add a body shaped like {"error": {"code": ..., "fields": [{"field": ..., "issue": ..., "expected_example": ...}]}} — the gain is large.
The "422-with-no-body" case and the "missing required field 'first_name'" / "expiry_date must be in YYYY-MM-DD format" examples come from Stytch's blog post "If an AI agent can't figure out how your API works, neither can your users" (stytch.com/blog/). The llms.txt format and adoption pattern follows Buildwithfern's "API Docs for AI Agents: llms.txt Guide Feb 2026" (buildwithfern.com/post/optimizing-api-docs-ai-agents-llms-txt-guide). The 100-300 word recommendation for MCP tool descriptions follows Anthropic's "Writing effective tools for AI agents" (anthropic.com/engineering/writing-tools-for-agents). Aggregate observations on documentation as the mirror of agent success draw from Stytch, Composio, and OpenAI's agent guide. The four AEO Documentation Score metrics and target ranges (85% / 90% / 2.0 / 3.0) are KanseiLink internal values based on April 2026 measurements — these are working recommendations, not industry consensus. Pricing and specs change without notice; verify against the official docs before production use.