Back to Knowledge Hub

Claude Opus 4.5 Crossing 80% SWE-bench Verified: Why That’s a Big Deal

Opus 4.5 hits ~80.9% on SWE-bench Verified, showing agent strength plus better cost-per-fix knobs.

Opus 4.5 pairs high pass rates with an "effort" knob to trade cost for deeper reasoning. It also reduces output tokens, lowering cost per successful fix.

Agent progress on SWE-bench Verified
20232025

Implication: Benchmarks are converging on real repo competence, not just completion quality.

Enjoyed this article?

Explore more in-depth guides and comparisons in our Knowledge Hub.