SWE-Bench Verified Benchmark Rankings | BAUS.AI — AI Agents & Models Ranking

SWE-Bench Verified

Name: SWE-Bench Verified Benchmark Results
Creator: BAUS.AI

SWE-Bench Verified is a human-validated subset of SWE-Bench containing 500 real-world GitHub issues from 12 Python repos that models must resolve by writing code patches.

What it measures: Real-world software engineering ability — reading issue descriptions, understanding codebases, and writing correct patches.
How it was administered: Models generate code patches for real GitHub issues; patches tested against repo test suites; percentage resolved metric.

Model rankings

Models ranked by score on this benchmark. Higher is better.

Rank	Model	Provider	Score	Percentile	Tags

SWE-Bench Verified

About this test

Model rankings

Discussion