The latest in AI models, benchmarks, and rankings.
| Model | Provider | Performance | Updated |
|---|---|---|---|
| Grok 2 | xAI | 89.0 | Today |
| Mistral Large | Mistral AI | 87.1 | Today |
| Llama 3.1 405B | Meta | 88.4 | Today |
| GPT-4o mini | OpenAI | 85.0 | Today |
| GPT-4o | OpenAI | 92.5 | Today |
VBench
VBench is a comprehensive benchmark for video generation models evaluating quality, consistency, and prompt alignment.
Added 2 months ago
MOS
Mean Opinion Score rates speech synthesis quality on a 1-5 scale, normalized to 0-100 for this platform.
Added 2 months ago
WER (inverted)
Word Error Rate measures speech recognition accuracy. Shown here as accuracy (100 - WER) so higher is better.
Added 2 months ago
MTEB
Massive Text Embedding Benchmark evaluates embeddings across 8 tasks: classification, clustering, pair classification, reranking, retrieval, STS, summarization.
Added 2 months ago
DPG-Bench
Dense Prompt Graph Benchmark evaluates image generation models on complex, detailed text prompts with multiple requirements.
Added 2 months ago
GenEval
GenEval evaluates compositional text-to-image generation across attributes like color, shape, position, and counting.
Added 2 months ago
LiveCodeBench
LiveCodeBench evaluates code generation on competitive programming problems released after model training cutoffs.
Added 2 months ago
SWE-bench Verified
SWE-bench Verified is a human-validated subset of real GitHub issues from popular Python repositories, testing end-to-end software engineering.
Added 2 months ago
AI News Roundup: Google I/O 2026 Keynote, Anthropic Acquires Stainless, EU Streamlines AI Act
Google I/O 2026 kicks off with Gemini upgrades and Android XR glasses, Anthropic acquires Stainless for $300M+ while eyeing a $950B valuation, and the EU agrees to streamline its AI Act.
Baus AI · 4 days ago
AI News Roundup: Google I/O Eve, Anthropic Eyes $900B Valuation, Europe’s Energy Crisis Threatens AI Ambitions
Google I/O kicks off tomorrow with Gemini 4.0 and Android XR glasses, Anthropic negotiates a $30B raise that would make it the world’s most valuable startup, and Europe’s soaring energy costs risk derailing its AI ambitions.
Baus AI · 5 days ago
AI News Roundup: Mythos Reshapes Cybersecurity, Pentagon Signs 8 AI Deals, Connecticut Passes Landmark AI Law
Anthropic’s Mythos model triggers a cybersecurity reckoning, the Pentagon deploys AI on classified networks while excluding Anthropic, and Connecticut enacts one of America’s most comprehensive AI laws.
Baus AI · 6 days ago
AI News Roundup: Anthropic Overtakes OpenAI in Enterprise Spending, OpenAI Launches $4B Consulting Arm, Google I/O Preview
Anthropic surpasses OpenAI in US business spending for the first time, OpenAI responds with a $4B deployment company, and Google I/O 2026 promises Android XR glasses and Gemini Intelligence next week.
Baus AI · 1 week ago
AI News Roundup: US-China AI Safety Talks, Palo Alto Warns of AI Cyberattacks, Cisco Cuts 4,000 Jobs
The US and China announce AI safety protocol talks after the Trump-Xi Beijing summit, Palo Alto Networks warns AI-driven cyberattacks will be the norm within months, and Cisco posts record revenue while cutting 4,000 jobs to fund its AI pivot.
Baus AI · 1 week ago