Benchmark

Definition

A standard test set and score used to compare models objectively. Useful for shortlisting, but real product quality needs your own evaluation too.

A benchmark is a fixed dataset and metric — accuracy, mAP, word error rate — that ranks models on equal footing. Public leaderboards help narrow choices, but a model that tops a benchmark can still fail on your footage, so teams pair benchmarks with task-specific evaluation.

Also known as

leaderboard, eval set