ITU-T P.1203 is the first standardized model of streaming Quality of Experience (QoE) — the quality of the whole viewing session rather than a single clip. Where a metric like VMAF answers how good one frame or clip looks, P.1203 answers how good the session was, which is the question a streaming operator actually has. It combines three parts: a short-term video module, an audio module, and an integration module that folds in what ruins a session but never touches a frame's fidelity — the initial loading delay, rebuffering stalls, and the quality switches inherent to adaptive bitrate delivery. The ITU-T P.1204 family of short-term video-quality models feeds its video module into this framework, so the two stack: P.1204 scores the picture, P.1203 turns picture quality plus delivery events into a session-level Mean Opinion Score. The catch is scope: P.1203 is a parametric model of the viewing experience, so its accuracy depends on capturing the stalling and switching events, and it answers a different question than a picture-fidelity metric.

