Anthropic
Fourth-generation Sonnet model with extended thinking and strong agentic task performance. Optimised for workflows requiring a balance of capability and speed.
51.0
Quality Score
1351
Arena ELO
Undisclosed
Parameters
200K
Context
Use this section to answer one simple question first: how much outside evidence do we have that this model performs well? Structured benchmark scores appear first, then official provider evidence, then live arena signal.
This model has normalized benchmark rows, so scores here are directly comparable across benchmark sources.
Sign in to join the discussion
0
Downloads
0
Likes
May 2025
Released
These are recent benchmark or leaderboard claims from official provider sources. They are useful for freshness and context, but they are not treated the same as normalized independent benchmark rows.
claude-4-sonnet - SWE-Bench Verified
SWE-Bench Verified resolved rate 76.4
View sourceClaude-Sonnet-4 - LiveCodeBench
LiveCodeBench pass@1 59.4 across 1055 tasks
View sourceclaude-4-sonnet-20250522 - SWE-Bench Verified
SWE-Bench Verified resolved rate 75.2
View sourceclaude-4-sonnet - GAIA
GAIA score 53.8 from mt_agent_2.0
View sourceclaude-sonnet-4 - GAIA
GAIA score 55.5 from xManus v0.3
View source1351
ELO Score
1346 - 1358
95% Confidence
+/-6 points
12.5K
Battles
Apr 8, 2026
Last Updated
1207
ELO Score
1190 - 1224
95% Confidence
+/-17 points
1.2K
Battles
Apr 8, 2026
Last Updated