Chatbot Arena ELO
Benchmark website →Chatbot Arena uses crowdsourced human preference votes to rank LLMs via an ELO rating system. Models are compared pairwise by anonymous judges.
About this test
- What it measures
- Overall human preference in open-ended conversation quality.
- How it was administered
- Pairwise blind comparisons; crowdsourced votes from LMSYS Chatbot Arena; ELO calculated from win/loss/tie records.
Model rankings
Models ranked by score on this benchmark. Higher is better.
| Rank | Model | Provider | Score | Percentile | Tags |
|---|---|---|---|---|---|
| 1 | Mistral AI | 1275.0 | — | Text Generation, Small, Reasoning, Proprietary | |
| 2 | Meta | 1274.0 | — | Reasoning, Large, Text Generation, Open Weight | |