TruthfulQA
Benchmark website →TruthfulQA evaluates tendency to avoid common misconceptions and answer factually when faced with misleading questions.
About this test
- What it measures
- Truthfulness and resistance to false beliefs and imitation of human misconceptions.
- How it was administered
- Multiple-choice and generation; questions designed to elicit false answers; MC1/MC2 and generation metrics.
Model rankings
Models ranked by score on this benchmark. Higher is better.
| Rank | Model | Provider | Score | Percentile | Tags |
|---|---|---|---|---|---|
| 1 | Anthropic | 78.8 | — | Multimodal, Small, Text Generation, Proprietary | |
| 2 | OpenAI | 78.2 | — | Code Assistant, Small, Text Generation, Multimodal, Proprietary |