Why this matters
If you operate or are planning an OTT platform, QoE is the single set of numbers that connects your engineering to your revenue. A media founder can have the best catalogue, the right price, and a beautiful app, and still lose viewers in the first few seconds because the video took too long to start or froze in the middle of the scene that mattered. This article is for the non-technical operator — the founder, product manager, or streaming executive — who needs to understand QoE well enough to set targets, read a dashboard, and ask engineers the right questions, without becoming an engineer. It sits under the OTT analytics map, which frames the three families of streaming metrics; here we go deep on the third family, quality, and on the two metrics inside it that cause the most lost viewing: startup time and rebuffering.
The one idea: quality is what the viewer feels, not what the server sent
Start with the distinction that prevents most QoE confusion. There are two different questions you can ask about a stream. The first is quality of service (QoS): did the network and servers do their job — was bandwidth available, did the bytes arrive, were the error rates low? The second is quality of experience (QoE): did the viewer have a good time — did the video start quickly, play smoothly, and look good on their screen? QoS is measured at the infrastructure; QoE is measured at the player, where the human actually sits.
The two are related but not the same, and the gap between them is where platforms lose money. A content-delivery network can report a perfectly healthy 99.99% availability while a viewer on a congested home Wi-Fi connection watches a spinning loader and gives up. The network did its job; the experience still failed. This is the core insight of the internet standard that governs this whole area, the Internet Engineering Task Force's RFC 9317 (Operational Considerations for Streaming Media, October 2022): it frames the entire problem around "the quality of experience (QoE) when streaming video," and it notes a structural blind spot — a CDN "cannot tell which request belongs to which playback session... or whether any of the clients have stalled and are rebuffering." The bytes leaving the server and the experience reaching the viewer are measured in different places, and only the player sees the truth.
So the discipline of QoE is to measure what the viewer experienced, from the player, and to standardize how you measure it so two teams mean the same thing by "startup time." The rest of this article is the set of QoE metrics worth standardizing on, the targets that matter, and the traps that hide a problem until viewers have already left.
The QoE quartet: the four numbers that decide whether viewers stay
Across the standards and the major analytics vendors, the same small set of metrics shows up as the core of streaming quality. We will call them the QoE quartet. Two of them — startup time and rebuffering — drive the most lost viewing, so they get the most space below, but all four belong on any operator's dashboard.
Figure 1. The QoE quartet. Two metrics measure waiting (startup time, rebuffering), one measures sharpness (bitrate delivered), and one measures outright failure (playback failure rate). Targets are industry rules of thumb, not standards — date any number you quote.
The first is video startup time (also called video start time, VST, or join time): the elapsed time from the moment the viewer presses play to the moment the first frame actually appears on screen. The second is rebuffering ratio (also called the buffering ratio or rebuffering rate): the share of the total viewing time during which the video was frozen, waiting for more data to download. The third is the bitrate delivered: the average data rate of the video the viewer actually received, which is the closest proxy for how sharp the picture looked. The fourth is the playback failure rate: the share of play attempts that ended in an error instead of playing — the worst experience of all, because the viewer sees nothing.
These four are not a list one company invented. The Consumer Technology Association's standard CTA-2066 (Streaming Quality of Experience Events, Properties and Metrics, March 2020) exists precisely to define "a set of media player events, properties, QoE metrics, and associated terminology for representing streaming media QoE across systems, media players, and analytics vendors" — because, as the standard's own framing notes, these metrics "were used in slightly (or vastly) different ways that led to interoperability issues." CTA-2066 defines common terminology and "how each metric should be computed for consistent reporting." That standard is the reason your startup-time number can be compared to anyone else's — if everyone follows it.
The precise, engineering-level definitions of these QoE metrics — exactly which player events bound each measurement — are owned by our streaming section; see video QoE metrics in the Video Streaming course for the player-event-level definitions. Here the focus is the OTT operator's view: what each metric means for your business, what target to hold, and how each one maps to viewers leaving.
Video startup time: the first two seconds decide the session
Video startup time is the first thing a viewer experiences and the first place you lose them. It is the wait between pressing play and seeing the first frame — the spinning loader, the black screen, the moment of doubt about whether the app is broken. Because it happens before any value has been delivered, the viewer's patience here is at its thinnest.
The cost of a slow start is not a guess. The clearest evidence comes from a landmark academic study by Krishnan and Sitaraman, conducted on Akamai's delivery network and published at the 2012 Internet Measurement Conference. They analyzed an unprecedented dataset: 6.7 million unique viewers worldwide who watched 23 million videos for 216 million minutes over ten days. The finding that every streaming operator should know: viewers begin to abandon a video if it does not start within about two seconds, and beyond that point each additional one second of startup delay increases the abandonment rate by roughly 5.8%. A viewer who waits ten seconds for a video to start is, in large numbers, a viewer who is no longer there.
This is why "under two seconds" has become the industry rule of thumb for startup time, with the best platforms pushing well below it. The target is not a standard — no spec mandates two seconds — but the abandonment curve behind it is one of the most replicated findings in streaming research.
The math: what a two-second-slower start costs in a year
Show the arithmetic out loud, because this is where QoE becomes a budget line. Suppose your platform sees 1,000,000 play attempts a day, and a configuration change — a slower origin, a heavier player, an extra ad call before the first frame — adds two seconds to your startup time, pushing you from a fast start into the abandonment zone.
Apply the study's figure of 5.8% additional abandonment per added second:
extra abandonment = 2 seconds × 5.8% per second = 11.6%
plays lost per day = 1,000,000 × 11.6% = 116,000 plays
plays lost per year = 116,000 × 365 ≈ 42,340,000 plays
Now attach a value. In an ad-supported model, suppose each play is worth a conservative $0.02 in advertising revenue:
revenue lost per year = 42,340,000 plays × $0.02 ≈ $847,000
Roughly $847,000 a year — from two seconds. And that is before counting the long-term damage: the same body of research found that a viewer who fails to start a video is measurably less likely to come back to the site at all. Startup time is not an engineering footnote; it is one of the highest-leverage numbers on the platform.
What makes startup slow, in plain terms
Startup time is the sum of several waits stacked end to end: the player has to find and reach the server, download the manifest (the small text file that lists where the video segments are), request a decryption license if the content is protected, download the first segment or two, fill a small buffer, and then begin to decode and paint frames. Anything that lengthens one of those steps lengthens the start. Common culprits are a distant origin server with no cache nearby, a first segment that is too large, a license round-trip that runs in series instead of in parallel, or — a frequent and avoidable one — a client-side advertisement that must load and play before the content even begins.
Figure 2. Anatomy of a video start. Each step adds to the wait; the two-second line is where abandonment begins to climb at roughly 5.8% per additional second.
Rebuffering: the freeze that loses the most watch time
If startup time decides whether the session begins, rebuffering decides whether it survives. Rebuffering is what happens when the video stops mid-playback because the player has run out of downloaded data and must pause to reload — the frozen frame, the spinner, the "buffering..." that interrupts the scene. Conviva, one of the major streaming-analytics vendors, defines it plainly: rebuffering "is when the video stalls during playback and the viewer must wait for the video to resume playing," and notes that "frequent rebuffering is a major source of poor quality of experience and often leads to audience abandoning the content."
The metric you track is the rebuffering ratio: the total time spent frozen divided by the total time spent watching, expressed as a percentage. If a viewer watched for 30 minutes and 18 seconds of that was spent frozen and reloading, the rebuffering ratio is 18 seconds ÷ 1,800 seconds = 1%. The industry rule of thumb is to keep this under about 1%, with the best platforms holding it near or below 0.5%. Again, these are operating targets, not standards — but the lower, the better, and the relationship to lost viewing is direct.
How direct? The same Akamai study quantified it: a viewer whose video froze for just 1% of its duration watched 5% fewer minutes than a comparable viewer whose video played smoothly. The penalty is leveraged — a small amount of freezing costs a multiple of itself in lost watch time, because freezing does not just waste the frozen seconds, it pushes viewers to quit entirely. This is why rebuffering, not picture sharpness, is usually the QoE metric most tightly correlated with engagement: viewers will tolerate a slightly soft picture far more readily than a picture that stops.
Why rebuffering happens — and the trade-off with sharpness
Rebuffering is fundamentally a race between two speeds: how fast the player can download video versus how fast it plays it back. Every adaptive player keeps a small reservoir of downloaded video called the buffer; as long as the buffer has data, playback is smooth. When the network slows down and the buffer empties faster than it refills, the video freezes. RFC 9317 describes the player's job exactly: send "enough media to ensure that the media player does not 'stall', without sending so much media that the media player cannot accept it."
This is where rebuffering meets the picture-quality metric, because the player is constantly choosing between them. The mechanism that does this is adaptive bitrate (ABR) streaming, the technique by which the player switches between higher-quality (larger, sharper) and lower-quality (smaller, softer) versions of the same video depending on the network speed it is currently measuring. When the network is strong, the player climbs to a higher bitrate and the picture sharpens; when the network weakens, the player drops to a lower bitrate to keep the buffer full and avoid a freeze. The ABR algorithm's whole purpose, in RFC 9317's words, is "the lowest chances for a rebuffering event (playback stall)" while still delivering the highest quality the connection allows. The deep mechanics of how ABR makes that choice belong to the streaming layer — see adaptive bitrate streaming — but the operator's takeaway is simple: rebuffering and bitrate are two ends of one lever, and a platform that never rebuffers but always looks soft has simply chosen one failure over the other.
A large part of avoiding both at once is an encoding and delivery problem, not a player problem. A well-built encoding ladder — the set of quality levels the player can choose from — gives the ABR algorithm sensible rungs to step down to, so it can drop quality smoothly instead of freezing. And the content-delivery network that caches your video near viewers determines how fast the buffer refills in the first place. QoE is downstream of those decisions.
Bitrate delivered and playback failure: the other two of the quartet
The third metric, bitrate delivered (or average bitrate), is the average data rate of the video the viewer actually received over their session, measured in megabits per second (Mbps). It is the best single proxy for how sharp the picture looked, because a higher bitrate carries more visual detail. There is no universal target — the right bitrate depends on the resolution, the codec, and the content — but the operator's job is to deliver the highest bitrate the viewer's network can sustain without triggering rebuffering. A platform that delivers a high average bitrate with a low rebuffering ratio has tuned the trade-off well; one that delivers a high bitrate but also a high rebuffering ratio is being too aggressive, and one that never rebuffers but delivers a low average bitrate is leaving sharpness on the table.
The fourth metric, the playback failure rate, captures the worst experience a viewer can have: the play attempt that errors out and never delivers video at all. Conviva splits this into two timing buckets that are worth distinguishing. A video start failure (VSF) is an attempt that fails before the first frame ever appears — the viewer pressed play and got an error instead of a video. A video playback failure (VPF) is a stream that terminates due to an error after it had started playing — a file corruption, a sudden interruption, an exhausted resource. Both are counted against total attempts. As Conviva puts it, a playback failure "represents the worst possible experience a viewer can have because it interrupts the stream completely so they can't watch." The target for both is to drive toward zero; any sustained failure rate above a fraction of a percent is a fire.
| QoE metric | Plain meaning | What it signals | Rule-of-thumb target |
|---|---|---|---|
| Video startup time | Press-play to first frame | Whether the session begins at all | Under ~2 s; best ≤ 1 s |
| Rebuffering ratio | Share of viewing time frozen | Whether the session survives | Under ~1%; best ≤ 0.5% |
| Bitrate delivered | Average data rate received (Mbps) | How sharp the picture looked | As high as the network sustains without freezing |
| Playback failure rate (VSF + VPF) | Share of attempts that errored out | Outright broken experiences | Toward 0%; > ~0.5% is urgent |
Targets are 2026 industry operating conventions, not standards; they vary by content type, device, and region. Date and re-validate any number you publish. Metric terminology follows CTA-2066.
Putting a number on the whole experience: QoE scores and MOS
Operators eventually want one number that says "how good was the experience" so they can track it on a single chart and benchmark against peers. There are two approaches, and it helps to know both exist.
The first is a standardized perceptual model. The International Telecommunication Union's ITU-T Recommendation P.1203 is the first internationally standardized QoE model for HTTP adaptive streaming. It takes the technical parameters of a session — the video quality, the audio quality, the quality switches, the initial loading delay, and the stalling (rebuffering) events — and outputs a Mean Opinion Score (MOS): a single number from 1 (bad) to 5 (excellent) that estimates how a typical human would rate the experience. P.1203 was trained and validated against more than a thousand audiovisual test sequences containing exactly the impairments OTT viewers hit — stalling, coding artifacts, and quality switches. It is bounded (designed for sessions up to about five minutes, resolutions up to 1080p, frame rates up to 30 fps in its initial scope), but it matters because it is an open, standardized way to turn raw QoE measurements into a perceptual score that means the same thing everywhere.
The second is a vendor composite index. Conviva's Streaming Performance Index (SPI), for example, grades the overall quality of every stream by combining video start failure, exits before video start, rebuffering ratio, video playback failures, video startup time, and picture quality into one score, with peer benchmarks to put your number in context. The value of a composite is operational simplicity — one dial for an executive dashboard. The risk is that the weighting is proprietary and opaque, so a composite is most useful when you can still break it down into the underlying quartet to see what moved it. We compare the platforms that produce these scores — including the open telemetry route — in the QoE measurement stack.
How QoE gets measured: the player tells, the CDN listens
A QoE number is only as honest as the place it is measured, and the honest place is the player. Because the viewer's experience happens on their device, the metrics that describe it must be collected there and sent back. This is the job of player-side beacons — small reports the video player emits as events happen (play requested, first frame shown, stall started, stall ended, bitrate switched, error thrown) — which an analytics service aggregates into the quartet. The instrumentation of those beacons is its own discipline; see player QoE instrumentation for how the events are captured on each platform.
There is also a standardized way for the player to tell the delivery network what it is experiencing, which closes the blind spot RFC 9317 described. The Consumer Technology Association's CTA-5004, Common Media Client Data (CMCD) (September 2020), defines a standard format for the player to attach media-relevant information — its buffer level, the bitrate it is requesting, whether it is in danger of stalling — to the requests it sends the CDN. Without it, as RFC 9317 explains, the CDN "produces millions of log lines per second" but "has no concept of a 'session'" and "cannot tell... whether any of the clients have stalled and are rebuffering or are about to stall." CMCD lets the network see what the player sees, so it can prioritize the segment a viewer is about to need before the buffer runs dry. For an operator, the lesson is that QoE measurement and QoE improvement are increasingly the same data flowing to two destinations: your analytics dashboard and your CDN.
Common mistakes that hide a QoE problem
QoE measurement goes wrong in predictable ways, and every one of them lets a real problem stay invisible until viewers have already left.
The most common is reporting QoE as a single average across everything. A platform-wide average startup time of 1.8 seconds looks healthy and can completely hide that viewers on smart TVs in one region are waiting eight seconds. QoE lives in the distribution and in the segments — by device, by region, by content, by CDN. The average is the number that lets a serious problem hide inside a healthy headline; always read the 95th percentile and the per-segment breakdown.
The second is confusing quality of service with quality of experience — trusting the CDN's 99.99% availability dashboard as proof that viewers are happy. As established above, the network can be healthy while the experience fails on the last hop into the home. Measure at the player or you are measuring the wrong thing.
The third is letting an ad call inflate startup time. When a client-side advertisement must load and render before the content's first frame, the viewer's perceived startup time includes the entire ad-loading wait — and if the ad server is slow, you have manufactured an abandonment problem out of your own monetization stack. RFC 9317 notes this directly: ad insertion "does not mean that the insertion of ads has no effect on the user's quality of experience," and poor connectivity to the ad service "can cause rebuffering even if the underlying media assets... can be accessed quickly." Server-side ad insertion is one answer; either way, measure startup with the ad in the path, because that is what the viewer feels.
The fourth is chasing bitrate at the cost of rebuffering. Pushing the highest possible picture quality feels like a quality win, but if it empties the buffer and freezes the video, you have traded the sharpness viewers tolerate for the freezing they abandon over. The quartet has to be read together: a bitrate gain that raises the rebuffering ratio is usually a net QoE loss.
Where Fora Soft fits in
Fora Soft has built video streaming and OTT/Internet TV software since 2005, with 625+ shipped projects for 400+ clients, and QoE is where streaming scale becomes a product decision rather than a slide. When a platform has to hold sub-two-second starts and sub-1% rebuffering across phones, smart TVs, and set-top boxes — and across regions with very different networks — the work is in instrumenting consistent player beacons on every screen, tuning the encoding ladder and CDN strategy that the quartet depends on, and building the dashboards that read QoE by segment instead of by misleading average. That is the same streaming, encoding, and delivery experience we apply across video conferencing, e-learning, telemedicine, and surveillance, where a frozen frame is never acceptable. We are vendor-neutral about the analytics layer: we instrument to the standards (CTA-2066, CMCD) so your numbers stay portable.
What to read next
- The OTT analytics map: audience, engagement, quality — how QoE fits the three metric families.
- The QoE measurement stack: Mux Data, Conviva, and open telemetry — how to measure the quartet at scale.
- Retention and engagement analytics — how QoE traces forward into churn and renewal.
Download the QoE Target & Diagnosis Checklist (PDF)
Call to action
- Talk to a streaming engineer — book a 30-minute scoping call to talk through your video qoe plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
- Download the QoE Target & Diagnosis Checklist — One-page reference to the QoE quartet (video startup time, rebuffering ratio, bitrate delivered, playback failure rate): each metric's plain definition, a rule-of-thumb target, the most likely causes when it degrades, and the….
References
- Operational Considerations for Streaming Media (RFC 9317). Internet Engineering Task Force (IETF), October 2022. Tier 1 (IETF standards-track informational RFC). Frames streaming around quality of experience; §4.4 names startup time, playback stability, and stall avoidance as the on-demand QoE considerations; §5.1/§5.3 define the player's buffer-vs-stall trade-off and the ABR goal of "the lowest chances for a rebuffering event"; §5.4 notes ad insertion's QoE impact; §5.6 establishes CTA-2066 and CTA-5004 (CMCD) and the CDN's session blind spot. https://www.rfc-editor.org/rfc/rfc9317.html (accessed 2026-06-19).
- CTA-2066: Streaming Quality of Experience Events, Properties and Metrics. Consumer Technology Association (CTA WAVE), March 2020. Tier 1 (industry standard). Defines a common set of media-player events, properties, and QoE metrics and "how each metric should be computed for consistent reporting" across players and analytics vendors — the standard that makes startup-time and rebuffering numbers comparable. https://shop.cta.tech/products/streaming-quality-of-experience-events-properties-and-metrics (accessed 2026-06-19).
- CTA-5004: Web Application Video Ecosystem — Common Media Client Data (CMCD). Consumer Technology Association (CTA WAVE), September 2020. Tier 1 (industry standard). Defines a standardized way for media players to send media-relevant data (buffer level, requested bitrate, stall risk) to the CDN, closing the player↔CDN session blind spot described in RFC 9317 §5.6. https://shop.cta.tech/products/web-application-video-ecosystem-common-media-client-data-cta-5004 (accessed 2026-06-19).
- ITU-T Recommendation P.1203 — Parametric bitstream-based quality assessment of progressive download and adaptive audiovisual streaming. International Telecommunication Union (ITU-T), 2017 (and module recommendations P.1203.1/.2/.3). Tier 1 (international standard). The first standardized QoE model for HTTP adaptive streaming; integrates video quality, audio quality, quality switches, initial loading delay, and stalling into a Mean Opinion Score (1–5); validated on 1,000+ audiovisual sequences. https://www.itu.int/rec/T-REC-P.1203 (accessed 2026-06-19).
- Digital Video Impression Measurement Guidelines (v1.1). Interactive Advertising Bureau (IAB) / Media Rating Council (MRC). Tier 1 (industry standard). Requires that a video impression be counted client-initiated and only when the first frame begins to render — anchoring the "first frame, not buffer start" definition of when a play (and its startup-time measurement) completes. https://www.iab.com/wp-content/uploads/2016/12/Digital-Video-Impression-Measurement-Guidelines_1.1.pdf (accessed 2026-06-19).
- Video Stream Quality Impacts Viewer Behavior: Inferring Causality Using Quasi-Experimental Designs. S. Shunmuga Krishnan and Ramesh K. Sitaraman (Akamai / UMass Amherst), Proceedings of the ACM Internet Measurement Conference (IMC) 2012; extended in IEEE/ACM Transactions on Networking, 2013. Tier 5 (peer-reviewed academic). The 23-million-view study: 6.7M unique viewers, 216M minutes, 10 days; viewers abandon after ~2 s startup delay with ~5.8% more abandonment per added second; 1% of duration frozen → 5% fewer minutes watched; failed playback → 2.3% less likely to return that week. https://people.cs.umass.edu/~ramesh/Site/HOME_files/imc208-krishnan.pdf (accessed 2026-06-19).
- Understanding the Impact of Video Quality on User Engagement. Florin Dobrian et al. (Conviva), Proceedings of ACM SIGCOMM 2011. Tier 5 (peer-reviewed academic). Foundational large-scale study establishing that buffering ratio is the QoE metric most strongly associated with reduced viewer engagement. https://dl.acm.org/doi/10.1145/2018436.2018478 (accessed 2026-06-19).
- OTT 101: Top 5 Metrics that Matter for Tech Ops. Conviva, updated 2025-03-28. Tier 3 (first-party analytics vendor). Operator-facing definitions of rebuffering ratio, video start failure (VSF), video playback failure (VPF), attempts, concurrent plays, and the Streaming Performance Index (SPI) composite. https://www.conviva.com/ott-101-top-5-metrics-that-matter-for-tech-ops/ (accessed 2026-06-19).
- Understand Mux Data metric definitions. Mux. Tier 3 (first-party analytics vendor). Player-side definitions of video startup time, rebuffering, and the view/watch-time terms used to bound a QoE session. https://www.mux.com/docs/guides/understand-metric-definitions (accessed 2026-06-19).
Spec/standard precedence note (per §4.3.2): where popular "video KPI" listicles quote a single flat rebuffering or startup number with no definition or source, this article follows the controlling measurement standard (CTA-2066) for metric terminology and computation, RFC 9317 for the operational framing and the player↔CDN data flow (CMCD), and ITU-T P.1203 for the perceptual-score model. The quantitative abandonment and engagement effects are cited to the peer-reviewed Krishnan/Sitaraman and Dobrian studies, not to vendor blogs; the sub-2-second and sub-1% operating targets are dated industry conventions and are flagged as such.


