RSSUpdated 2 hours ago
Anthropic Admits Three Engineering Errors Behind Claude Code Decline

Claude Code Postmortem

Anthropic Admits Three Engineering Errors Behind Claude Code Decline

Anthropic has published a detailed postmortem confirming that three separate engineering changes — a lowered reasoning default, a caching bug that wiped session memory, and a verbosity‑capping system prompt — caused the monthlong quality decline developers reported in Claude Code. All issues are now fixed, but the episode has shaken developer trust.

The Admission: Three Bugs, Not Intentional Degradation

After weeks of developer complaints about declining output quality, Anthropic has published a detailed postmortem confirming that three separate engineering changes — not intentional throttling — caused the widely‑experienced quality decline in Claude Code. The API was never affected; only Claude Code, the Agent SDK, and Cowork were impacted.

Anthropic was blunt: "This is not the experience users should expect from Claude Code." All three issues are now resolved as of version v2.1.116 (April 20), and usage limits were reset for all subscribers on April 23, according to The Register.

Bug 1: Reasoning Effort Quietly Lowered

On March 4, Anthropic changed Claude Code’s default reasoning effort from “high” to “medium” to reduce latency. The problem: Opus 4.6 in high‑effort mode occasionally thought too long, making the UI appear frozen. Internal evaluations showed medium effort achieved “slightly lower intelligence with significantly less latency.”

The tradeoff was wrong. Users preferred defaulting to higher intelligence and opting into lower effort for simple tasks. Despite shipping UI changes to make effort settings more visible, most users never changed the default. The change was reverted on April 7, and the latest builds now default to “xhigh” for Opus 4.7 and “high” for all other models, per VentureBeat.

Bug 2: Caching Bug Wiped Session Memory Every Turn

On March 26, Anthropic shipped a caching optimization intended to clear old “thinking” sections for sessions idle longer than one hour. The intended behavior: clear once, then resume sending full reasoning history on the next turn.

The actual behavior: the bug cleared thinking on every subsequent turn for the rest of the session, as Anthropic’s postmortem explains. If a follow‑up message was sent during a tool use, even the current turn’s reasoning was dropped. The result: Claude appeared forgetful, repetitive, and erratic — and usage limits drained faster than expected due to continuous cache misses.

The bug was fixed on April 10 in v2.1.101. Anthropic noted it was hard to catch because it only triggered in a corner case involving stale sessions, and two unrelated internal experiments masked the symptoms in testing. Ironically, Opus 4.7 found the bug when back‑testing Code Review against the offending PRs — Opus 4.6 could not.

"Opus 4.7 found the bug when back-testing Code Review against the offending PRs, while Opus 4.6 did not."

Block Quote Avatar
Anthropic Engineering Postmortem Author

Bug 3: Verbosity Caps Slashed Code Quality

On April 16, alongside the Opus 4.7 launch, Anthropic added a system prompt instruction capping model responses: “Length limits: keep text between tool calls to ≤25 words. Keep final responses to ≤100 words unless the task requires more detail.”

The intent was to reduce Opus 4.7’s natural verbosity, which makes it smarter on hard problems but produces more output tokens. Multiple weeks of internal testing showed no regressions. But a broader ablation study during the investigation revealed a 3% drop in evaluation scores across both Opus 4.6 and 4.7. The prompt line was reverted on April 20, according to Fortune.

The Evidence Was Mounting for Weeks

Independent audits had been ringing alarms well before the postmortem. Stella Laurenzo, Senior Director at AMD, published an audit of 6,852 Claude Code session files and 234,000+ tool calls on GitHub, showing performance decline including reasoning loops and a tendency to choose the “simplest fix” over the correct one, per VentureBeat.

The BridgeBench benchmark showed Claude Opus 4.6 accuracy dropping from 83.3% to 68.3%, with its ranking falling from No. 2 to No. 10. Security researchers at Veracode found Claude Opus 4.7 introduced a vulnerability in 52% of coding tasks tested (compared to ~30% for OpenAI models), while Fortune reports TrustedSec measured a 47% drop in overall code quality.

Compute Constraints: The Elephant in the Room

Despite the engineering admission, speculation persists that compute rationing is the underlying issue. Anthropic’s revenue run rate exploded from $1 billion (January 2025) to $30 billion (April 2026), straining infrastructure at every level. Fortune reports multiple signs of strain:

  • Usage caps Anthropic introduced usage limits during peak hours and tested removing Claude Code from the $20/month Pro plan for some new signups
  • Mythos restricted The most powerful model, Mythos, was limited to select large firms — officially due to security risks, but compute constraints likely played a role
  • Pricing shifts Enterprise pricing moving to a consumption‑based model that could triple costs for heavy users
  • Outages Series of service outages as usage surged beyond infrastructure capacity

What Anthropic Is Changing

The postmortem includes concrete commitments to prevent a repeat. Anthropic and The Register report these changes:

  • Dogfooding: A larger share of internal staff will use the exact public build of Claude Code, not internal test versions
  • Broad evals for prompt changes: Every system prompt change now runs a broader per‑model evaluation suite, with ablations to understand each line’s impact
  • Model‑specific gating: Changes that could trade off against intelligence require soak periods, broader evals, and gradual rollouts
  • Transparency: New @ClaudeDevs on X for in‑depth product decision explanations, plus centralized threads on GitHub
  • Usage limit reset: All subscriber limits were reset on April 23

For builders relying on Claude Code, the lesson is clear: the model weights never regressed, but the software layer around them can degrade your experience just as much. Pin your API calls when stability matters, and watch for system‑level changes that don’t show up in model version numbers.

Share this article

PostShare

More on This Story

Related News

Google Bets $40 Billion on Anthropic as AI Compute Race Escalates

Apr 25, 2026

Google Bets $40 Billion on Anthropic as AI Compute Race Escalates

Alphabet will invest up to $40 billion in Anthropic, with $10 billion upfront and $30 billion more if performance targets are met. The deal secures massive compute capacity for Claude's maker while deepening the complex competitor-partner dynamic between Google and its biggest AI rival.

anthropicgooglealphabet
Perplexity AI Drops Ads to Win User Trust: A Game-Changer in AI Monetization

Feb 19, 2026

Perplexity AI Drops Ads to Win User Trust: A Game-Changer in AI Monetization

Perplexity AI has decided to ditch its advertising model to foster user trust, aligning its business strategy with a subscription-based model. Amid industry giants like Google and OpenAI exploring ad revenue, Perplexity's bold move may be a harbinger of a new monetization era focusing on consumer trust over advertising revenues. This decision highlights a significant shift in the AI industry, where maintaining user trust is increasingly critical. Perplexity's move aims to convince over 100 million users through its subscription services, setting a high bar in an industry where ad-driven models are often the norm. This development poses critical questions for competitors and raises broader implications for AI reliability and governance.

Perplexity AIAI monetizationuser trust
Anthropic's New Research: Does AI-Assistance Diminish Coding Skills?

Feb 2, 2026

Anthropic's New Research: Does AI-Assistance Diminish Coding Skills?

Anthropic's eye-opening research explores whether relying on AI is detrimental to developing coding skills, challenging the widespread belief of AI merely reducing effort. This fascinating study pits AI-assisted coders against a non-assisted group, revealing fascinating results about skill acquisition and engagement.

AnthropicAI coding assistancecoding skills