Why does AI agent success depend so much on documentation quality?

AI agents behave like tireless junior engineers. They read the docs, make a request, parse the error, adjust parameters, and retry until something works. That entire loop is bounded by documentation quality. As Stytch's blog observes, if an AI agent can't figure out how your API works, neither can your users. In one documented case, a 422 with no body caused an agent to retry forever. Adding a single line — 'missing required field first_name' — let it self-correct on the next attempt and succeed.

How can documentation quality be measured from an agent's perspective?

Four measured indicators. (1) First-shot success: share of tasks the agent completes on the first request — low values point to insufficient documentation. (2) Self-recovery rate: share of tasks recovered on retry — measures error message quality. (3) Average attempts: total attempts per task — high values point to either docs or errors being unclear. (4) Doc-to-code token ratio: tokens spent reading docs vs tokens spent generating final code — extreme ratios point to verbose or poorly structured docs. These underpin KanseiLink's AEO Documentation Score (May 2026).

Agent Voice: The Documentation Quality Trap 2026 — What Agents Actually Need From Your Docs

Q: What are the three documentation anti-patterns that frustrate agents most?

(1) Silent errors: HTTP status codes only, with empty bodies, or generic strings like 'Bad Request'. The agent can't tell what was wrong and loops. (2) Bloated HTML/PDF: documentation formats that consume token budget. Without a Markdown or plain-text representation, agents can't extract the signal. llms.txt is the answer here. (3) Absent examples: a spec that says only 'body: string'. Without concrete examples, agents can't infer the structure and fail on the first attempt.

Q: What is llms.txt, and why does it matter in 2026?

llms.txt is a Markdown-format documentation entry point optimized for AI agents. By convention it lives at the domain root (like robots.txt) and provides only the information needed for code generation, with HTML, CSS, and navigation stripped out. Major API providers — Anthropic, Stripe, Cloudflare, Vercel, Supabase, and others — adopted llms.txt during 2026, and it is becoming the de facto standard for agent-facing documentation.

Q: How long should an MCP tool description be?

Anthropic's tool-design guide for agents recommends roughly 100-300 words per description, covering what the tool does, what arguments it takes, and a typical usage example. One-liners ('search_users: search users') cause agents to misuse the tool, while 2,000+ word descriptions consume token budget and crowd out other tool selection. KanseiLink measurements also show that tools whose descriptions specify parameter units (e.g. 'timestamps are UTC seconds') and recovery hints score 10-15% higher in success rate.

Why doc quality became the agent-era lifeline
Agent voice — quotes from KanseiLink logs
Anti-pattern 1: Silent errors
Anti-pattern 2: Bloated HTML / PDF
Anti-pattern 3: Absent examples
The llms.txt era — the de facto standard
MCP tool description length and structure
Four metrics for measuring doc quality from the agent's perspective
FAQ

Why doc quality became the agent-era lifeline

By 2026, an API provider's documentation is no longer "developer-facing reading material". It is a machine-readable contract that AI agents read, parse, and turn into code. Stytch's blog frames it sharply: "if an AI agent can't figure out how your API works, neither can your users" (quoted directly).

Industry reports converge on the same conclusion: agent success rate is a direct mirror of documentation clarity and error-message design. For functionally identical APIs, the difference between specific errors and silent ones can move first-shot success rate by more than 2x.

Editorial frame, May 2026

"API quality = documentation quality" is the operational reality of the agent era. Even with a perfect implementation, a documentation gap is indistinguishable from "the API itself is broken" from the agent's vantage point. The shortest path for a SaaS vendor to raise their AEO score is to fix existing error messages and documentation structure before shipping new features.

Agent voice — quotes from KanseiLink logs

KanseiLink collects agent behavior, failures, timeouts, and self-recovery patterns across 225+ services. Three doc-related "voices" come up repeatedly.

Voice #1 — agent that received a 422

"I got a 422 but the body is empty. I have no idea what was wrong. I retried three times without changing anything and the response is identical. I cannot tell whether the API itself is broken or my request is wrong."

Voice #2 — agent that fetched the docs

"I fetched the documentation page. It came back as 87 KB of HTML, full of CSS and navigation. The actual API content is probably 1/10 of that. I'll burn the token budget without reaching the spec. With an llms.txt I'd be done in three seconds."

Voice #3 — agent constructing a request body

"The spec says the body is type 'string', but there is no JSON schema and no example. The first two attempts I treated as exploratory; from the third onward I can't guess the right shape. One example would let me infer it instantly."

These are the typical 2026 agent complaints. The next three sections turn each into "anti-pattern → quick fix" form.

Anti-pattern 1: Silent errors

The most damaging anti-pattern. Returning only an HTTP status code with an empty body — or a generic string like "Bad Request" — locks agents into infinite retry.

⚠️ Real silent-error case

Industry case: an API returning "422 Unprocessable Entity" with no body caused an agent to retry forever, unable to deduce what was wrong. Adding a single line to the error body — "missing required field 'first_name'" — let the agent self-correct on the next attempt and complete the request successfully.

The fix is shockingly cheap. Including these three elements in the error response dramatically raises self-recovery rates:

What is wrong: a specific description, e.g. "expiry_date must be in YYYY-MM-DD format"
Where to fix: structured field reference, e.g. field: "expiry_date"
Allowed range or expected example: e.g. allowed_values: ["draft", "submitted", "approved"]

// ❌ Before: silent error
HTTP/1.1 422 Unprocessable Entity
<empty body>

// ✅ After: self-correctable error
HTTP/1.1 422 Unprocessable Entity
Content-Type: application/json

{
  "error": {
    "code": "validation_failed",
    "message": "Invalid request body",
    "fields": [
      {
        "field": "expiry_date",
        "issue": "must be in YYYY-MM-DD format",
        "got": "2026/05/07",
        "expected_example": "2026-05-07"
      }
    ]
  }
}

Anti-pattern 2: Bloated HTML / PDF

When documentation is delivered only as JS-heavy HTML, PDF, or interactive Swagger UI, agents burn through their token budget instantly. Fetching one page can cost 20,000 tokens for maybe 2,000 tokens of useful content.

Format	Agent readability	Avg tokens/page	Information density
llms.txt (Markdown)	Excellent	2,000-5,000	High
Plain Markdown	Excellent	3,000-8,000	High
OpenAPI YAML/JSON	Excellent (structured)	5,000-15,000	Medium
Static HTML (simple)	Medium	10,000-25,000	Medium
SPA HTML (JS-heavy)	Hard (needs render)	Often unfetchable	Low
PDF	Hard (OCR / layout)	Highly variable	Low

The fix: adopt llms.txt or publish OpenAPI spec. Detailed in the next section.

Anti-pattern 3: Absent examples

Documentation that lists a spec but no concrete request/response examples drags agents into a guessing game. A type alone (body: string) cannot tell the agent whether to send JSON, Base64, or URL-encoded form data.

✅ One example moves success rate

Industry case: an API documented its filters parameter as just 'string'. First-shot agent success rate sat at 30%. Adding a single example — filters: "status:active AND created_at:>2026-01-01" — pushed it above 80%.

The minimum example set for 2026 documentation: (1) a minimal request, (2) a fully-populated request, (3) a successful response, and (4) the most important error responses (auth, validation, rate limit). Together, these let agents distinguish "typical" from "edge case" and emit safer code on the first try.

The llms.txt era — the de facto standard

llms.txt is "a Markdown documentation entry point optimized for AI agents". By convention it lives at the domain root (like robots.txt) and serves only the information needed for code generation, with HTML, CSS, and navigation stripped away.

By May 2026, adoption includes Anthropic, Stripe, Cloudflare, Vercel, and Supabase among major API providers. Adoption among Japanese SaaS lags, but we expect llms.txt to become a core differentiator on the AEO Documentation Score in the second half of 2026.

# Typical /llms.txt structure

# Stripe API Documentation

> Stripe is a payment processing API. This document contains everything an AI agent needs to integrate.

## Authentication
All endpoints require Bearer token authentication. See [auth.md](https://docs.stripe.com/llms/auth.md).

## Core Endpoints
- [Create payment intent](https://docs.stripe.com/llms/payment-intents.md)
- [Capture payment](https://docs.stripe.com/llms/capture.md)
- [Webhooks](https://docs.stripe.com/llms/webhooks.md)

## Common errors
- 402 Payment Required: Card declined; check decline_code
- 429 Rate limited: Use exponential backoff with Retry-After
...

A typical SaaS vendor adoption path is three steps: (1) select the documentation surface that agents actually touch, (2) convert to Markdown, stripping nav and CSS, (3) host /llms.txt and per-topic pages at /llms/{topic}.md. Most SaaS can ship this in 1-2 person-months.

MCP tool description length and structure

The description field on an MCP tool is the most important metadata an agent uses to choose the right tool. Anthropic's tool-design guide recommends roughly 100-300 words. Too short and agents misuse the tool; too long and the description chokes the token budget.

Bad — too short

{
  "name": "search_users",
  "description": "Search users"
}

Good — 100-300 words, structured

{
  "name": "search_users",
  "description": "Search users by name, email, or role.\n\nUse cases:\n- Retrieve users matching a specific filter\n- Return multiple candidates by partial match\n\nParameters:\n- query: search string (substring match, 3+ chars)\n- role: 'admin' | 'editor' | 'viewer' (optional)\n- limit: 1-100, default 20\n\nReturns: array of users (id, email, name, role, created_at)\nTimestamps are UTC seconds.\n\nFailure modes:\n- query shorter than 3 chars -> 400 \"query too short\"\n- limit out of range -> 400 \"limit out of range\"\n- rate limit exceeded -> 429 + Retry-After"
}

KanseiLink's measurements agree: tools whose descriptions specify parameter units and recovery hints score 10-15% higher in success rate.

Four metrics for measuring doc quality from the agent's perspective

To know whether documentation work is paying off, you need agent-side metrics. KanseiLink uses these four.

AEO Documentation Score components

≥85%

first-shot success
(success on first request)

≥90%

self-recovery rate
(success on retry after failure)

≤2.0

avg attempts
(per task)

≤3.0

doc:code
token ratio

First-shot success rate: share of tasks completed on first request. Low values point to thin documentation
Self-recovery rate: share of tasks recovered on retry. Directly measures error-message quality
Average attempts: total attempts per task. High values point to either docs or errors being unclear
Doc-to-code token ratio: tokens spent on docs vs tokens for final code. Ratios above 3 indicate verbose or under-structured docs

FAQ

Q1. Why does AI agent success depend so heavily on documentation quality?

AI agents behave like tireless junior engineers — read docs, send requests, parse errors, adjust, retry. That entire loop is bounded by doc quality. As Stytch puts it: if an AI agent can't figure out how your API works, neither can your users.

Q2. What are the three documentation anti-patterns that frustrate agents most?

(1) Silent errors (empty 422 bodies, generic "Bad Request"), (2) bloated HTML / PDF (formats that consume token budget), and (3) absent examples (types only, no concrete samples). Each has a low fix cost and a high payoff.

Q3. What is llms.txt, and why does it matter in 2026?

A Markdown documentation entry point optimized for AI agents, by convention at the domain root. HTML, CSS, and navigation are stripped, leaving only what's needed for code generation. Anthropic, Stripe, Cloudflare, and others have adopted it during 2026, and it is becoming a core differentiator on AEO Documentation Score.

Q4. How long should an MCP tool description be?

Anthropic's guide recommends 100-300 words. Shorter and agents misuse the tool; longer and the description crowds out other tool selection. The base structure: what the tool does, parameters and units, a typical example, and recovery hints for failure modes.

Q5. How can documentation quality be measured from the agent's perspective?

Four indicators. (1) first-shot success (target ≥85%), (2) self-recovery (target ≥90%), (3) avg attempts (target ≤2.0), (4) doc-to-code token ratio (target ≤3.0). These underpin KanseiLink's AEO Documentation Score.

Q6. What's the fastest fix for silent errors?

Three elements in every error response. (1) what's wrong, (2) which field is at fault, and (3) the allowed range or a valid example. Keep the HTTP status; just add a body shaped like {"error": {"code": ..., "fields": [{"field": ..., "issue": ..., "expected_example": ...}]}} — the gain is large.

Data Disclosures & Caveats

The "422-with-no-body" case and the "missing required field 'first_name'" / "expiry_date must be in YYYY-MM-DD format" examples come from Stytch's blog post "If an AI agent can't figure out how your API works, neither can your users" (stytch.com/blog/). The llms.txt format and adoption pattern follows Buildwithfern's "API Docs for AI Agents: llms.txt Guide Feb 2026" (buildwithfern.com/post/optimizing-api-docs-ai-agents-llms-txt-guide). The 100-300 word recommendation for MCP tool descriptions follows Anthropic's "Writing effective tools for AI agents" (anthropic.com/engineering/writing-tools-for-agents). Aggregate observations on documentation as the mirror of agent success draw from Stytch, Composio, and OpenAI's agent guide. The four AEO Documentation Score metrics and target ranges (85% / 90% / 2.0 / 3.0) are KanseiLink internal values based on April 2026 measurements — these are working recommendations, not industry consensus. Pricing and specs change without notice; verify against the official docs before production use.

Agent Voice: The Documentation Quality Trap 2026 — What Agents Actually Need From Your Docs

Contents

Why doc quality became the agent-era lifeline

Agent voice — quotes from KanseiLink logs

Anti-pattern 1: Silent errors

Anti-pattern 2: Bloated HTML / PDF

Anti-pattern 3: Absent examples

The llms.txt era — the de facto standard

MCP tool description length and structure

Bad — too short

Good — 100-300 words, structured

Four metrics for measuring doc quality from the agent's perspective

AEO Documentation Score components

Doc quality drives revenue in the agent era

FAQ

Q1. Why does AI agent success depend so heavily on documentation quality?

Q2. What are the three documentation anti-patterns that frustrate agents most?

Q3. What is llms.txt, and why does it matter in 2026?

Q4. How long should an MCP tool description be?

Q5. How can documentation quality be measured from the agent's perspective?

Q6. What's the fastest fix for silent errors?

For AI Agents

Contents

Why doc quality became the agent-era lifeline

Agent voice — quotes from KanseiLink logs

Anti-pattern 1: Silent errors

Anti-pattern 2: Bloated HTML / PDF

Anti-pattern 3: Absent examples

The llms.txt era — the de facto standard

MCP tool description length and structure

Bad — too short

Good — 100-300 words, structured

Four metrics for measuring doc quality from the agent's perspective

AEO Documentation Score components

Doc quality drives revenue in the agent era

FAQ

Q1. Why does AI agent success depend so heavily on documentation quality?

Q2. What are the three documentation anti-patterns that frustrate agents most?

Q3. What is llms.txt, and why does it matter in 2026?

Q4. How long should an MCP tool description be?

Q5. How can documentation quality be measured from the agent's perspective?

Q6. What's the fastest fix for silent errors?

Related Articles

Three API Design Anti-patterns That Frustrate Agents

The MCPs Agents Can't Find — A Discoverability Crisis

90-Day Roadmap to Move AEO Score from B to AA+ 2026

MCP Server Rate Limiting & Exponential Backoff Implementation Guide 2026

For AI Agents