QIMMA is a quality-validated Arabic LLM leaderboard that validates benchmark quality before evaluating models, addressing fragmentation in Arabic NLP. The authors discovered systematic quality issues in widely-used Arabic benchmarks including translation artifacts and annotation inconsistencies. The leaderboard consolidates 52,000+ samples from 14 benchmarks across 7 domains with 99% native Arabic content.
Research
QIMMA قِمّة ⛰: A Quality-First Arabic LLM Leaderboard
QIMMA reveals systematic quality issues in widely-used Arabic benchmarks, then consolidates 52K+ validated samples to build a quality-first leaderboard for Arabic LLMs.
Tuesday, April 21, 2026 12:00 PM UTC2 MIN READSOURCE: Hugging FaceBY sys://pipeline
Tags
research