Anthropic released Claude Opus 4.8 on 28 May 2026, roughly six weeks after Opus 4.7 shipped on 16 April. On paper it is an incremental update — modestly higher coding and reasoning scores at an unchanged price. The pitch Anthropic leads with, though, is less about raw capability and more about temperament: a model it describes as having "sharper judgement, more honesty about its progress, and the ability to work independently for longer than its predecessors." Per 9to5Mac's write-up, early testers report it is more likely to flag uncertainty about its own work and less likely to make unsupported claims.

Disclosure, since it matters here: this article was drafted with AI assistance using Claude Opus 4.8 — the very model it covers. That is exactly why we treat the figures below as vendor-stated and lean on independent checks, not the model's (or the maker's) own grading.

What Anthropic says is new

The behavioural framing is the story. "Honesty about progress" and "fewer unsupported claims" are not benchmark categories; they are reliability properties — the difference between an agent that quietly fabricates a passing test and one that tells you it is stuck. For anyone running Claude as an autonomous coding or research agent, calibrated uncertainty is worth more than a point or two on a leaderboard, because it determines how much you can trust the work without re-checking every step. The catch is that "more honest" is, for now, Anthropic's characterisation plus early-tester impressions — not something independent evaluations have yet quantified.

The benchmarks — Anthropic's own numbers

Anthropic reports gains across its internal suite from Opus 4.7 to 4.8. These are vendor-published figures; independent benchmarks on neutral leaderboards await.

Benchmark (Anthropic-reported)Opus 4.7Opus 4.8
Agentic coding64.3%69.2%
Multidisciplinary reasoning with tools54.7%57.9%
Agentic computer use82.8%83.4%
Agentic financial analysis51.5%53.9%
Knowledge work (index)17531890

The largest jump is in agentic coding — roughly five points, which is meaningful at the top of the range where each point is harder to win. Anthropic positions Opus 4.8 ahead of rival frontier models on several coding measures, though coverage notes a competitor still leads on at least one terminal-focused benchmark. As ever, treat cross-vendor comparisons run by one of the vendors with caution until third-party evaluations land.

A faster, cheaper fast mode — at the same headline price

The operational change most users will feel is speed. Anthropic says Opus 4.8's fast mode is about 2.5× quicker and three times cheaper than before, while the model's overall pricing is unchanged from Opus 4.7. That combination — same sticker price, materially faster and cheaper high-throughput path — is the quiet trend across the frontier in 2026: with raw capability gains getting harder, labs are competing on latency and cost-per-task rather than headline intelligence scores. MacRumors notes the upgrade is available to existing Claude subscribers and API users without a price change.

The other launches: dynamic workflows, effort control, and "Mythos"

Opus 4.8 arrived alongside several smaller updates. The most notable is Dynamic workflows (in research preview), which lets Claude take on larger, multi-step tasks inside Claude Code — a step toward longer-horizon autonomous work. Anthropic also shipped effort control in claude.ai and its Cowork surface (letting users dial how hard the model works a problem), system-entry support in the Messages API, and signalled that a "Mythos-class" model would become available "in the coming weeks." For developers, dynamic workflows plus longer independent runtime is the combination to watch — it is where an assistant edges toward an operator.

28 May 2026Release date
~6 weeksAfter Opus 4.7 (16 Apr)
2.5× / 3×Fast mode faster / cheaper
UnchangedPrice vs Opus 4.7

The cadence question

A point release every six weeks is a fast clock. It is good for users — steady, low-drama improvements at a stable price — but it also reflects where the frontier sits: the era of giant capability leaps between versions has given way to a tighter loop of incremental, mostly-agentic refinements. The interesting tension is that Anthropic is marketing this release on honesty and judgement rather than a new high score, which is either a sign the benchmark race is plateauing, or a sign the company has decided that for real agentic work, trustworthiness is the metric that actually moves the needle. Probably both.