A benchmark is a fixed dataset and metric — accuracy, mAP, word error rate — that ranks models on equal footing. Public leaderboards help narrow choices, but a model that tops a benchmark can still fail on your footage, so teams pair benchmarks with task-specific evaluation.

