Contents

  1. Three-line summary of the week
  2. Claude Opus 4.7 — SWE-bench Verified 87.6%, 1M context at standard pricing
  3. A $1.5B JV pushes Anthropic into Wall Street
  4. Claude Managed Agents — dreaming, outcomes, multiagent
  5. MCP Apps (SEP-1865) ships as the first official extension
  6. SAP × Anthropic, Claude on AWS
  7. What Japanese SaaS vendors should do — three actions
  8. FAQ

Three-line summary of the week

Claude Opus 4.7 — SWE-bench Verified 87.6%, 1M context at standard pricing

Anthropic released Claude Opus 4.7 on April 16, 2026, and by May Week 2 it was the default for production agent workloads, including the new financial-services use cases. The model is best understood as a smooth upgrade from 4.6 — same price ($5 / $25 per M tokens), better numbers across the board.

Claude Opus 4.7 vs 4.6 — benchmarks (Anthropic, official)

87.6%
SWE-bench Verified
(4.6: 80.8%)
69.4%
Terminal-Bench 2.0
(4.6: 65.4%)
94.2%
GPQA Diamond
(4.6: 91.3%)
64.4%
Finance Agent
(4.6: 60.7%)

The architectural headlines: 1M context window at standard pricing (no long-context premium), a new xhigh effort level slotted between high and max, and high-resolution image support up to 2,576px / 3.75MP (roughly 2× the prior limit). SWE-bench Pro shows a 10-point gain over 4.6 and now leads every public competitor.

The implication for SaaS

Standard-priced 1M context changes the design center. Workloads that used to require chunking — pulling thousands of records, multi-month invoice ledgers, long design docs — can now be loaded in a single pass for cross-cutting analysis. The historical "paginate everything" API contract no longer fits all use cases; "fetch a large slice and analyze in one window" is now a legitimate pattern.

A $1.5B JV pushes Anthropic into Wall Street

On May 4, 2026, Anthropic announced a $1.5B joint venture with Blackstone, Hellman & Friedman, and Goldman Sachs, with Anthropic, Blackstone, and H&F each contributing roughly $300M. The next day, in an invite-only briefing in New York, Anthropic unveiled a suite of pre-built AI agents for the world's largest banks.

The same announcements packaged full Microsoft 365 integration and a Moody's data partnership, with leaders including JPMorgan Chase's Jamie Dimon attending. The strategic signal is unambiguous: financial services is now one of Claude's flagship verticals.

⚠️ Knock-on effects beyond finance

The investment also marks Anthropic's broader bet on large-enterprise core systems. Combined with the same week's SAP × Anthropic partnership (Claude on the SAP Business AI Platform), the message is that Anthropic is going hard at the operational backbone of global enterprise. Japanese enterprise SaaS should be moving on the question "which vertical do we make Claude-ready first?" — not whether to do it.

Claude Managed Agents — dreaming, outcomes, multiagent

On May 6, 2026, Anthropic shipped three new features for Claude Managed Agents. These are the most operationally impactful updates of the week, because they reach directly into how agents learn, are evaluated, and coordinate.

1. Dreaming (research preview)

A scheduled process that reviews past agent sessions and memory stores, extracts patterns, and curates the memory — the "sleep cycle" of long-running agents. Dreaming surfaces recurring mistakes, workflows agents converge on, and preferences shared across a team. Anthropic's launch post reported a pilot at legal-AI startup Harvey where dreaming lifted task completion roughly 6×.

2. Outcomes

You write a rubric describing what success looks like, and the agent works toward it. A separate grader evaluates the output in its own context window — uncoupled from the agent's reasoning — and when the result misses, the grader pinpoints what needs to change so the agent can retry. Quality control shifts from "tweak the prompt" to "specify the evaluation axis."

3. Multiagent orchestration

A lead agent breaks a job into pieces and delegates each to a specialist with its own model, prompt, and tools. Specialists work in parallel on a shared filesystem, contributing back to the lead's overall context. Because events are persistent, the lead can check back in on specialist progress mid-workflow without losing thread state.

✅ Design implication

Outcomes + multiagent orchestration change the evaluation criteria for SaaS API design. Two questions get more weight: does your API expose unambiguous success/failure semantics for rubric-based grading? And is the granularity of your endpoints suited to parallel decomposition by multiple specialist agents? Both translate directly to Claude Managed Agents fit.

MCP Apps (SEP-1865) ships as the first official extension

MCP Apps (formerly mcp-ui), spec'd as SEP-1865 in early 2026, shipped its 2026-01-26 specification and is now described as the first official MCP extension — production-ready. By May Week 2, reference implementations had multiplied and SaaS vendors are in real investment-decision territory.

The mechanics:

The net effect: an MCP tool can return dashboards, forms, visualizations, and multi-step workflows directly into the agent's surface, rather than only plain text. For SaaS vendors, that's a new differentiation axis — "what does your tool feel like inside the agent's conversation," not just what data it returns.

SAP × Anthropic, Claude on AWS

The same week, multiple enterprise integration announcements arrived.

Announcement What Why it matters
SAP × Anthropic Claude embedded across SAP's AI portfolio (incl. SAP Business AI Platform) Enterprise ERP backbone gets Claude-ready; locks in a vertical-leadership signal
Claude on AWS Claude API delivered on Anthropic-managed infra via AWS billing and IAM Lowers procurement friction for AWS-native enterprises — auth and billing live where security teams already work
Financial Services agents Pre-built agent suite for global banks announced May 5 Financial services positioned as a flagship Claude vertical

What Japanese SaaS vendors should do — three actions

  1. Begin MCP Apps (SEP-1865) review. Interactive-UI MCP servers will be table stakes by year-end. Move on architecture and roadmap now to take an early-mover slot on "the experience our tool offers inside the agent."
  2. Make your API "outcome-checkable." Claude Managed Agents' outcomes feature grades against rubrics. If your API can't expose unambiguous success/failure semantics, no rubric will save you — invest in error envelope quality (see our prior piece on error-message anti-patterns).
  3. Plan for 1M context. Long-context at standard pricing makes "pull a large slice at once" a normal pattern. Audit your pagination, partial fetch, and schema predictability with that in mind — your API will be consumed differently than it was last quarter.

See where your SaaS sits in the Opus 4.7 / MCP Apps era

KanseiLink AEO Readiness Ratings track MCP support, API quality, and live agent behaviour across 225+ services, updated weekly. Find out exactly where you stand on the new bar.

Get your AEO snapshot

FAQ

Q1. What's the practical difference between Opus 4.6 and 4.7?

Better numbers across the board (SWE-bench Verified 87.6% vs 80.8%, Terminal-Bench 2.0 69.4% vs 65.4%, etc.), 1M context at standard pricing, a new "xhigh" effort level, and 2,576px image support. Same price ($5 / $25 per M tokens) — effectively a free upgrade for existing Opus workloads.

Q2. Is dreaming production-ready?

It's a research preview as of May 2026. Expect Anthropic guidance on production patterns soon, but in privacy-sensitive workloads — health, legal, sensitive HR — evaluate carefully first. Dreaming reviews past sessions to extract patterns, so confidentiality and audit requirements need to line up before you flip it on.

Q3. What skills does MCP Apps adoption require?

Beyond MCP server experience: building sandbox-iframe-compatible UI (typically React or similar client-side JS), and designing MCP JSON-RPC for UI ↔ host communication. If you already operate an MCP server, adding a ui:// resource is incremental rather than a rewrite.

Q4. Will Anthropic's Wall Street push affect Japan?

Yes. The combined finance / SAP / AWS push points squarely at large-enterprise core systems. Expect domestic megabanks, large SIs, and ERP vendors to face similar pressure within a quarter or two. The question for Japanese SaaS becomes "which vertical do we make Claude-ready first?" — not whether.

Q5. How is "outcomes" different from testing?

Tests run at development time; outcomes run continuously in production. A grader, in its own context window, evaluates the agent's output against your rubric, and triggers a retry if the result misses. The paradigm shifts from "tweak the prompt" to "specify the evaluation axis."

Q6. Should I just stuff 1M tokens into every prompt now?

No. The cost and latency curves still bite. "You have 1M, you don't have to use it" — agent accuracy correlates with information density, not raw token count, and adding low-relevance material can induce hallucinations. Use the headroom for genuinely larger contexts, not bloat.

Data disclosures & disclaimers

Claude Opus 4.7 release date (2026-04-16), benchmark numbers, pricing, and 1M context capability are cross-referenced against Anthropic's official blog, platform.claude.com docs, GitHub changelog, and llm-stats.com. Claude Managed Agents' dreaming/outcomes/multiagent orchestration features are based on Anthropic's "New in Claude Managed Agents" blog (2026-05-06) and SDTimes/9to5Mac reporting. The "~6× task completion" figure for Harvey is from Anthropic's own pilot blog and is not independently verified. Anthropic's $1.5B JV with Blackstone, H&F, and Goldman Sachs is from Fortune's 2026-05-05 coverage; SAP × Anthropic from SAP News Center, May 2026. MCP Apps (SEP-1865) references the 2026-01-26 specification on modelcontextprotocol.io. Pricing and specs may change without notice — always verify against current official documentation when deploying to production.