SOS, the standard deviation of the opinion scores for one clip, measures how far individual votes sit from their own average - how much the panel disagreed about that clip. A clip everyone rated 4 has a tiny SOS; one split between 2s and 5s has a large one. It is not a nuisance to average away but the input that tells you how much to trust the MOS, since it drives the standard error and the confidence interval. The most useful related result is the SOS hypothesis (Hossfeld, Schatz, and Egger, 2011): disagreement follows a predictable square law tied to the MOS, so on a 5-point scale the variance is SOS squared = a times (minus x squared plus 6x minus 5). Disagreement is smallest at the scale ends and largest in the middle. The single parameter a runs from 0 (perfect agreement) to 1 (maximum disorder) and summarizes a test's noisiness: an a near 0.2-0.25 is unremarkable, while a high value flags a confusing scale or contaminated panel.

