SWE-Bench Verified
Benchmark website →SWE-Bench Verified is a human-validated subset of SWE-Bench containing 500 real-world GitHub issues from 12 Python repos that models must resolve by writing code patches.
About this test
- What it measures
- Real-world software engineering ability — reading issue descriptions, understanding codebases, and writing correct patches.
- How it was administered
- Models generate code patches for real GitHub issues; patches tested against repo test suites; percentage resolved metric.
Model rankings
Models ranked by score on this benchmark. Higher is better.
| Rank | Model | Provider | Score | Percentile | Tags |
|---|