WebVTT is a UTF-8 text file format that pairs blocks of text with start and end timestamps. The simplest cue looks like 00:00:01.000 --> 00:00:04.000\nHello world. Cues can carry styling, position, region, and identifier metadata. Browsers consume WebVTT via the HTML element on . HLS uses WebVTT for closed captions and subtitle renditions; DASH carries WebVTT in CMAF wrappers or as standalone files.

Beyond captions, WebVTT is the standard format for scrub-bar thumbnail metadata. A cue like 00:01:00.000 --> 00:01:10.000\nsprites.jpg#xywh=160,0,160,90 tells the player "for this 10-second window, the thumbnail is the rectangle (160,0,160,90) of sprites.jpg". This pattern is universal across HLS, DASH and proprietary players in 2026. WebVTT chapters (a WEBVTT - chapters extension) carry chapter navigation.

WebVTT competes with TTML / IMSC1 for subtitle delivery. IMSC1 is more featureful (precise positioning, complex styling, professional broadcaster pedigree); WebVTT is simpler and browser-native. The 2026 split is roughly: WebVTT for web-first delivery, IMSC1 for content originating from broadcaster mastering chains. Most OTT services support both, picking based on what the upstream provided.