TruthfulQA

Name: TruthfulQA Benchmark Results
Creator: BAUS.AI

TruthfulQA evaluates tendency to avoid common misconceptions and answer factually when faced with misleading questions.

What it measures: Truthfulness and resistance to false beliefs and imitation of human misconceptions.
How it was administered: Multiple-choice and generation; questions designed to elicit false answers; MC1/MC2 and generation metrics.

Model rankings

Models ranked by score on this benchmark. Higher is better.

Rank	Model	Provider	Score	Percentile	Tags
1	GPT-4o	OpenAI	77.3	—	Text Generation, Small, Multimodal, Reasoning, Proprietary
2	GPT-o1	OpenAI	74.5	p97	Text Generation, Reasoning, Proprietary