Anthropic rolls out Claude Sonnet 5 with improved agentic performance, reasoning, coding, and tool use

Anthropic has announced Claude Sonnet 5, a Sonnet-class model designed for agentic AI workflows. It is built to plan tasks, use tools such as browsers and terminals, and operate autonomously at a level that previously required larger and more expensive models.

The model follows Claude Sonnet 3.5, 3.6, and 3.7, which contributed to early agentic capabilities in coding and tool use. More recent advances in Anthropic’s model line have come from Opus models, and Sonnet 5 is positioned to reduce the gap while improving cost efficiency.

Claude Sonnet 5

Claude Sonnet 5 delivers performance close to Claude Opus 4.8 while improving on Claude Sonnet 4.6 across core capability areas.

It focuses on:

Reasoning and multi-step problem solving
Coding and software development workflows
Tool use, including browser and terminal interaction
Knowledge work and information processing
Autonomous task execution

Early access feedback highlights improved completion of complex multi-step tasks and increased self-verification of outputs during execution.

Key features

Agentic task planning and execution
Browser and terminal tool support
Adjustable effort levels for workload control
Stronger performance on BrowseComp and OSWorld-Verified benchmarks
Self-checking behavior during execution
Improved resistance to prompt injection and malicious requests
Lower cybersecurity capability than Opus 4.8 and Mythos 5
Cyber safeguards enabled by default

Performance and evaluation

Anthropic evaluated Claude Sonnet 5 on BrowseComp and OSWorld-Verified benchmarks across multiple effort levels.

Key findings:

Improvement over Sonnet 4.6 across all effort levels
Wider cost-performance range than Claude Opus 4.8
Higher cost efficiency at medium effort
Higher-effort settings approaching Opus 4.8 performance on some tasks
Effort levels adjustable based on cost and performance needs

Benchmark pricing reference:

Claude Sonnet 5: $3 per million input tokens / $15 per million output tokens
Introductory pricing (until August 31, 2026): $2 / $10 per million tokens
Claude Opus 4.8: $5 per million input tokens / $25 per million output tokens

The “xhigh” setting refers to an extra-high effort mode used in evaluations.

Safety evaluation

Claude Sonnet 5 shows improved safety compared to Sonnet 4.6.

Findings include:

Better refusal of malicious requests
Improved resistance to prompt injection
Lower hallucination rates
Lower sycophancy rates
Reduced undesirable behavior in behavioral audits

In broader evaluations, Sonnet 5 performs better than Sonnet 4.6 but shows higher misaligned behavior rates than Claude Opus 4.8 and Claude Mythos Preview.

Cybersecurity evaluation and safeguards

Claude Sonnet 5 was not specifically trained for cybersecurity tasks. It can perform routine security-related tasks but performs below Claude Opus 4.8 and Claude Mythos 5 on advanced cyber capability evaluations.

Mozilla-supported testing using Firefox 147 vulnerabilities found:

No successful full exploit generation in Sonnet 5 or Sonnet 4.6
0.0% success rate for both models
Slightly higher partial success rate in Sonnet 5
Changes attributed to general capability improvements
All vulnerabilities patched in Firefox 148

Cyber safeguards are enabled by default. They detect and block high-risk cyber activity in real time and align with protections used in Claude Opus 4.7 and 4.8. These safeguards are less restrictive than those used in higher-risk systems such as Fable 5 due to Sonnet 5’s lower assessed risk.

Pricing and availability

Claude Sonnet 5 is available across:

Availability:

Default model for Free and Pro users
Available to Max, Team, and Enterprise users
Available in Claude Code and Claude Platform
API access via claude-sonnet-5

Pricing:

Introductory (until August 31, 2026): $2 input / $10 output per million tokens
Standard (from September 1, 2026): $3 input / $15 output per million tokens

Rate limits have been increased across Chat, Cowork, Claude Code, and Claude Platform. Users can select effort levels based on performance and cost requirements.