Back to Knowledge Hub AI Benchmarks

Claude Opus 4.5 Crossing 80% SWE-bench Verified: Why That’s a Big Deal

Opus 4.5 hits ~80.9% on SWE-bench Verified, showing agent strength plus better cost-per-fix knobs.

5 min read Updated 2025

Opus 4.5 pairs high pass rates with an "effort" knob to trade cost for deeper reasoning. It also reduces output tokens, lowering cost per successful fix.

Agent progress on SWE-bench Verified

Implication: Benchmarks are converging on real repo competence, not just completion quality.

Enjoyed this article?

Explore more in-depth guides and comparisons in our Knowledge Hub.

Browse All Articles Compare Tools