MOS — Mean Opinion Score — is the average quality rating from a panel of human viewers, expressed on a 1-to-5 scale: 1 = Bad, 2 = Poor, 3 = Fair, 4 = Good, 5 = Excellent. Originated by telephone engineers in the 1960s to measure call quality, MOS spread to every quality-of-experience domain — audio compression, video compression, video conferencing, streaming services. When you read "this codec achieves MOS 4.2 at 5 Mbps", that means viewers in a controlled test rated the codec's output as solidly "Good" at that bitrate. MOS is the gold standard ground truth against which every objective quality metric (PSNR, SSIM, VMAF) tries to predict.

The methodology is governed by ITU recommendations P.800, P.910 and others. Panel size is typically 15–30 viewers; under 15 the statistical noise gets too high to draw firm conclusions. Viewing conditions are controlled — calibrated displays, defined ambient light, fixed viewing distance — so different test sites can compare results. Test design uses methodologies like ACR (rate each clip alone), DSCQS (side-by-side comparison) or DCR (rate degradation against a hidden reference). The scores are statistically analysed and a confidence interval is reported alongside the mean.

For a product team, MOS is the score that everything else tries to predict. You'll see it cited in codec papers, vendor white papers and competitive benchmarks. Practical caveat: MOS values from separate experiments cannot be directly compared unless they were designed for comparison — different panels, screens, content sets and methodologies can shift absolute MOS values by ±0.5 even when the encodes are identical. Use MOS for relative comparisons within one test, and use objective metrics (VMAF in particular, trained to predict MOS) for cross-experiment comparison.