Root-mean-square error (RMSE)

In metric validation, the root-mean-square error (RMSE) measures accuracy: the typical size of the gap between a metric's predicted score and the actual human MOS, expressed in the MOS scale's own units. It is 0 when the prediction is perfect and grows as errors grow, so a smaller RMSE means the metric's numbers land closer to where people put them. Like the Pearson correlation, RMSE is computed only after a monotonic fitting curve (a five-parameter logistic) maps the metric onto the MOS scale, because the raw scales do not line up. Its blind spot is that, as an average of squared errors, it can hide a rare large miss behind many small ones, and because it depends on the score scale it is not comparable across different metrics. RMSE is one of the four ITU-T P.1401 validation statistics alongside SROCC, PCC, and the outlier ratio; together they grade monotonicity, accuracy, and consistency. As an example, an outside reproduction reported a VMAF-DMOS RMSE of 12.7 VMAF points on a 4K set.

Root-mean-square error (RMSE)

Related terms

Pearson correlation (PCC)

Spearman correlation (SROCC)

Outlier ratio