What changed between Claude Opus 4.6 and Opus 4.7?

Claude Opus 4.7, released April 16, 2026, beats Opus 4.6 across the major coding and reasoning benchmarks: 87.6% vs 80.8% on SWE-bench Verified, 69.4% vs 65.4% on Terminal-Bench 2.0, 94.2% vs 91.3% on GPQA Diamond, and 64.4% vs 60.7% on Finance Agent. Pricing is unchanged at $5 / $25 per million input/output tokens. The headline architectural change is a 1M context window at standard pricing (no long-context premium), plus a new 'xhigh' effort level and high-resolution image support up to 2,576px / 3.75MP.

What does the 'dreaming' feature in Claude Managed Agents actually do?

Dreaming, released as a research preview on May 6, 2026, is a scheduled process that reviews an agent's past sessions and memory stores, extracts patterns, and curates the memory to help the agent self-improve. It surfaces things a single session can't see — recurring mistakes, workflows the agent keeps converging on, preferences shared across a team. Anthropic's launch blog cited a Harvey pilot where dreaming lifted task completion roughly 6×.

What does MCP Apps (SEP-1865) shipping as an official extension mean for SaaS vendors?

MCP Apps, formalized under SEP-1865 in early 2026 and released as the first official MCP extension, lets tools return interactive UI — dashboards, forms, visualizations, multi-step workflows — instead of plain text. UI resources are declared with a ui:// URI scheme, rendered in a sandboxed iframe by the host, and all UI↔host messages travel over MCP's JSON-RPC, making them loggable and auditable. For SaaS vendors, this opens a new differentiation axis: 'the experience your tool offers inside the agent surface,' rather than just the raw data returned.

What should Japanese SaaS vendors actually do this week?

Three priorities: (1) Begin spec review for MCP Apps (SEP-1865). Interactive-UI MCP servers will be table stakes by the end of 2026, so move on architecture now. (2) Make your APIs 'outcome-checkable.' Claude Managed Agents' new outcomes feature evaluates results against rubrics — your API needs to expose unambiguous success/failure semantics. (3) Plan for 1M context. Opus 4.7's standard-priced long context shifts the design center from 'paginate everything' to 'pull a large slice at once and analyze.' Your pagination, partial-fetch, and schema predictability should be revisited under that lens.

Claude Opus 4.7, Dreaming, Wall Street Agents, MCP Apps GA — AI Agent News, May Week 2 2026

Q: What's the structure of Anthropic's $1.5B Wall Street JV?

On May 4, 2026, Anthropic announced a $1.5B joint venture with Blackstone, Hellman & Friedman, and Goldman Sachs, with Anthropic, Blackstone, and H&F each contributing roughly $300M. The following day, at an invite-only briefing in New York, Anthropic unveiled a suite of pre-built AI agents for the world's largest banks, along with a Moody's data partnership and full Microsoft 365 integration. Together this signals Anthropic's intent to make financial services a flagship vertical.

Three-line summary of the week
Claude Opus 4.7 — SWE-bench Verified 87.6%, 1M context at standard pricing
A $1.5B JV pushes Anthropic into Wall Street
Claude Managed Agents — dreaming, outcomes, multiagent
MCP Apps (SEP-1865) ships as the first official extension
SAP × Anthropic, Claude on AWS
What Japanese SaaS vendors should do — three actions
FAQ

Three-line summary of the week

Model evolution: Claude Opus 4.7 (released April 16) reaches mainstream adoption. SWE-bench Verified 87.6%, 1M context window at standard pricing.
Finance push: Anthropic announces a $1.5B JV (Blackstone, H&F, Goldman Sachs) to bring pre-built AI agents to the world's largest banks.
Protocol evolution: Claude Managed Agents gains dreaming, outcomes, and multiagent orchestration; MCP Apps (SEP-1865) ships as the first official MCP extension. Self-improvement and UI extensibility are both shipping at the platform layer.

Claude Opus 4.7 — SWE-bench Verified 87.6%, 1M context at standard pricing

Anthropic released Claude Opus 4.7 on April 16, 2026, and by May Week 2 it was the default for production agent workloads, including the new financial-services use cases. The model is best understood as a smooth upgrade from 4.6 — same price ($5 / $25 per M tokens), better numbers across the board.

Claude Opus 4.7 vs 4.6 — benchmarks (Anthropic, official)

87.6%

SWE-bench Verified
(4.6: 80.8%)

69.4%

Terminal-Bench 2.0
(4.6: 65.4%)

94.2%

GPQA Diamond
(4.6: 91.3%)

64.4%

Finance Agent
(4.6: 60.7%)

The architectural headlines: 1M context window at standard pricing (no long-context premium), a new xhigh effort level slotted between high and max, and high-resolution image support up to 2,576px / 3.75MP (roughly 2× the prior limit). SWE-bench Pro shows a 10-point gain over 4.6 and now leads every public competitor.

The implication for SaaS

Standard-priced 1M context changes the design center. Workloads that used to require chunking — pulling thousands of records, multi-month invoice ledgers, long design docs — can now be loaded in a single pass for cross-cutting analysis. The historical "paginate everything" API contract no longer fits all use cases; "fetch a large slice and analyze in one window" is now a legitimate pattern.

A $1.5B JV pushes Anthropic into Wall Street

On May 4, 2026, Anthropic announced a $1.5B joint venture with Blackstone, Hellman & Friedman, and Goldman Sachs, with Anthropic, Blackstone, and H&F each contributing roughly $300M. The next day, in an invite-only briefing in New York, Anthropic unveiled a suite of pre-built AI agents for the world's largest banks.

The same announcements packaged full Microsoft 365 integration and a Moody's data partnership, with leaders including JPMorgan Chase's Jamie Dimon attending. The strategic signal is unambiguous: financial services is now one of Claude's flagship verticals.

⚠️ Knock-on effects beyond finance

The investment also marks Anthropic's broader bet on large-enterprise core systems. Combined with the same week's SAP × Anthropic partnership (Claude on the SAP Business AI Platform), the message is that Anthropic is going hard at the operational backbone of global enterprise. Japanese enterprise SaaS should be moving on the question "which vertical do we make Claude-ready first?" — not whether to do it.

Claude Managed Agents — dreaming, outcomes, multiagent

On May 6, 2026, Anthropic shipped three new features for Claude Managed Agents. These are the most operationally impactful updates of the week, because they reach directly into how agents learn, are evaluated, and coordinate.

1. Dreaming (research preview)

A scheduled process that reviews past agent sessions and memory stores, extracts patterns, and curates the memory — the "sleep cycle" of long-running agents. Dreaming surfaces recurring mistakes, workflows agents converge on, and preferences shared across a team. Anthropic's launch post reported a pilot at legal-AI startup Harvey where dreaming lifted task completion roughly 6×.

2. Outcomes

You write a rubric describing what success looks like, and the agent works toward it. A separate grader evaluates the output in its own context window — uncoupled from the agent's reasoning — and when the result misses, the grader pinpoints what needs to change so the agent can retry. Quality control shifts from "tweak the prompt" to "specify the evaluation axis."

3. Multiagent orchestration

A lead agent breaks a job into pieces and delegates each to a specialist with its own model, prompt, and tools. Specialists work in parallel on a shared filesystem, contributing back to the lead's overall context. Because events are persistent, the lead can check back in on specialist progress mid-workflow without losing thread state.

✅ Design implication

Outcomes + multiagent orchestration change the evaluation criteria for SaaS API design. Two questions get more weight: does your API expose unambiguous success/failure semantics for rubric-based grading? And is the granularity of your endpoints suited to parallel decomposition by multiple specialist agents? Both translate directly to Claude Managed Agents fit.

MCP Apps (SEP-1865) ships as the first official extension

MCP Apps (formerly mcp-ui), spec'd as SEP-1865 in early 2026, shipped its 2026-01-26 specification and is now described as the first official MCP extension — production-ready. By May Week 2, reference implementations had multiplied and SaaS vendors are in real investment-decision territory.

The mechanics:

MCP tools declare UI resources via a ui:// URI scheme
The host (Claude, etc.) renders them inside a sandboxed iframe
UI ↔ host communication uses MCP's JSON-RPC, so every interaction is loggable and auditable
UI-initiated tool calls can require explicit host approval before executing

The net effect: an MCP tool can return dashboards, forms, visualizations, and multi-step workflows directly into the agent's surface, rather than only plain text. For SaaS vendors, that's a new differentiation axis — "what does your tool feel like inside the agent's conversation," not just what data it returns.

SAP × Anthropic, Claude on AWS

The same week, multiple enterprise integration announcements arrived.

Announcement	What	Why it matters
SAP × Anthropic	Claude embedded across SAP's AI portfolio (incl. SAP Business AI Platform)	Enterprise ERP backbone gets Claude-ready; locks in a vertical-leadership signal
Claude on AWS	Claude API delivered on Anthropic-managed infra via AWS billing and IAM	Lowers procurement friction for AWS-native enterprises — auth and billing live where security teams already work
Financial Services agents	Pre-built agent suite for global banks announced May 5	Financial services positioned as a flagship Claude vertical

What Japanese SaaS vendors should do — three actions

Begin MCP Apps (SEP-1865) review. Interactive-UI MCP servers will be table stakes by year-end. Move on architecture and roadmap now to take an early-mover slot on "the experience our tool offers inside the agent."
Make your API "outcome-checkable." Claude Managed Agents' outcomes feature grades against rubrics. If your API can't expose unambiguous success/failure semantics, no rubric will save you — invest in error envelope quality (see our prior piece on error-message anti-patterns).
Plan for 1M context. Long-context at standard pricing makes "pull a large slice at once" a normal pattern. Audit your pagination, partial fetch, and schema predictability with that in mind — your API will be consumed differently than it was last quarter.

FAQ

Q1. What's the practical difference between Opus 4.6 and 4.7?

Better numbers across the board (SWE-bench Verified 87.6% vs 80.8%, Terminal-Bench 2.0 69.4% vs 65.4%, etc.), 1M context at standard pricing, a new "xhigh" effort level, and 2,576px image support. Same price ($5 / $25 per M tokens) — effectively a free upgrade for existing Opus workloads.

Q2. Is dreaming production-ready?

It's a research preview as of May 2026. Expect Anthropic guidance on production patterns soon, but in privacy-sensitive workloads — health, legal, sensitive HR — evaluate carefully first. Dreaming reviews past sessions to extract patterns, so confidentiality and audit requirements need to line up before you flip it on.

Q3. What skills does MCP Apps adoption require?

Beyond MCP server experience: building sandbox-iframe-compatible UI (typically React or similar client-side JS), and designing MCP JSON-RPC for UI ↔ host communication. If you already operate an MCP server, adding a ui:// resource is incremental rather than a rewrite.

Q4. Will Anthropic's Wall Street push affect Japan?

Yes. The combined finance / SAP / AWS push points squarely at large-enterprise core systems. Expect domestic megabanks, large SIs, and ERP vendors to face similar pressure within a quarter or two. The question for Japanese SaaS becomes "which vertical do we make Claude-ready first?" — not whether.

Q5. How is "outcomes" different from testing?

Tests run at development time; outcomes run continuously in production. A grader, in its own context window, evaluates the agent's output against your rubric, and triggers a retry if the result misses. The paradigm shifts from "tweak the prompt" to "specify the evaluation axis."

Q6. Should I just stuff 1M tokens into every prompt now?

No. The cost and latency curves still bite. "You have 1M, you don't have to use it" — agent accuracy correlates with information density, not raw token count, and adding low-relevance material can induce hallucinations. Use the headroom for genuinely larger contexts, not bloat.

Data disclosures & disclaimers

Claude Opus 4.7 release date (2026-04-16), benchmark numbers, pricing, and 1M context capability are cross-referenced against Anthropic's official blog, platform.claude.com docs, GitHub changelog, and llm-stats.com. Claude Managed Agents' dreaming/outcomes/multiagent orchestration features are based on Anthropic's "New in Claude Managed Agents" blog (2026-05-06) and SDTimes/9to5Mac reporting. The "~6× task completion" figure for Harvey is from Anthropic's own pilot blog and is not independently verified. Anthropic's $1.5B JV with Blackstone, H&F, and Goldman Sachs is from Fortune's 2026-05-05 coverage; SAP × Anthropic from SAP News Center, May 2026. MCP Apps (SEP-1865) references the 2026-01-26 specification on modelcontextprotocol.io. Pricing and specs may change without notice — always verify against current official documentation when deploying to production.

Claude Opus 4.7, Dreaming, Wall Street Agents, MCP Apps GA — AI Agent News, May Week 2 2026

Contents

Three-line summary of the week

Claude Opus 4.7 — SWE-bench Verified 87.6%, 1M context at standard pricing

Claude Opus 4.7 vs 4.6 — benchmarks (Anthropic, official)

A $1.5B JV pushes Anthropic into Wall Street

Claude Managed Agents — dreaming, outcomes, multiagent

1. Dreaming (research preview)

2. Outcomes

3. Multiagent orchestration

MCP Apps (SEP-1865) ships as the first official extension

SAP × Anthropic, Claude on AWS

What Japanese SaaS vendors should do — three actions

See where your SaaS sits in the Opus 4.7 / MCP Apps era

FAQ

Q1. What's the practical difference between Opus 4.6 and 4.7?

Q2. Is dreaming production-ready?

Q3. What skills does MCP Apps adoption require?

Q4. Will Anthropic's Wall Street push affect Japan?

Q5. How is "outcomes" different from testing?

Q6. Should I just stuff 1M tokens into every prompt now?

For AI Agents

Contents

Three-line summary of the week

Claude Opus 4.7 — SWE-bench Verified 87.6%, 1M context at standard pricing

Claude Opus 4.7 vs 4.6 — benchmarks (Anthropic, official)

A $1.5B JV pushes Anthropic into Wall Street

Claude Managed Agents — dreaming, outcomes, multiagent

1. Dreaming (research preview)

2. Outcomes

3. Multiagent orchestration

MCP Apps (SEP-1865) ships as the first official extension

SAP × Anthropic, Claude on AWS

What Japanese SaaS vendors should do — three actions

See where your SaaS sits in the Opus 4.7 / MCP Apps era

FAQ

Q1. What's the practical difference between Opus 4.6 and 4.7?

Q2. Is dreaming production-ready?

Q3. What skills does MCP Apps adoption require?

Q4. Will Anthropic's Wall Street push affect Japan?

Q5. How is "outcomes" different from testing?

Q6. Should I just stuff 1M tokens into every prompt now?

Related

ServiceNow, Salesforce, Blend, Precisely — Enterprise MCP Avalanche, May Week 1 2026

Fabric MCP GA, MCP Dev Summit NYC Recap, 97M Downloads — April 2026 AI Agent News Roundup

Claude Haiku vs Sonnet vs Opus: Task-Based Cost Optimization Guide for Japanese SaaS 2026

Linear MCP Deep Dive 2026 — Official Remote Server, OAuth 2.1 + DCR, 25+ Tools

For AI Agents