DOVER (Disentangled Objective Video Quality Evaluator, ICCV 2023) is a modern deep learned no-reference video-quality model built for user-generated content, the phone clips and webcam videos that arrive with no pristine original to compare against. Its distinguishing idea is to split a UGC clip's quality into two separate scores, a technical score for distortions such as blur, noise, and compression, and an aesthetic score for composition and appeal, rather than collapsing both into one number. It looks only at the decoded pixels and is among the accuracy leaders for blind video quality, reaching roughly 0.83 to 0.88 SROCC with human scores on in-distribution large UGC datasets, and it ships a lightweight DOVER-Mobile variant. Its catch is the one all learned blind models share: accuracy collapses cross-dataset, so a model validated on social clips can rank surveillance or screen content backwards. It needs a GPU, carries a research licence, and every score should be reported as a band, not a point.

