Why this matters

Every adaptive stream makes a quality decision several times a minute, and those decisions decide whether the viewer sees a sharp picture, a frozen one, or a flickering one. If you measure only the average bitrate you delivered, you will congratulate a player that is annoying its audience with constant quality jumps; if you measure only stalls, you will ship a stream that never buffers and never looks good either. This article is for the streaming engineer, player developer, and QoE analyst who has to compare two adaptive-bitrate (ABR) strategies, set a quality target for a live event, or explain to a product owner why "we hit 4K half the time" is not the win it sounds like. It is the deep dive behind the bitrate-and-switching line in the streaming QoE framing article, and the third leg of the first-impression trio alongside rebuffering and startup time. How the player actually decides which rung to pick — the ABR algorithm — lives in the Video Streaming section; here we measure what those decisions do to perceived quality.

What ABR is trading, in one minute

Start with the thing being measured. Adaptive bitrate streaming, called ABR, cuts a video into short segments — usually two to six seconds each — and encodes every segment at several bitrates, a set of rungs called the bitrate ladder. As playback runs, the player picks which rung to download next based on how fast the network is moving and how much video it has buffered. When bandwidth is good it climbs to a sharper, higher-bitrate rung; when bandwidth drops it steps down to keep playing.

That single mechanism creates a three-way tug of war, and naming the three forces is the whole point of measuring it. The first force is picture quality, which on a given ladder rises with the bitrate you spend. The second is rebuffering — the stall that happens when the player tries to spend more bits than the network can deliver and the buffer runs dry. The third is switching — the visible change in quality each time the player moves to a different rung. You cannot maximize the first without provoking the other two, and the discipline of this article is refusing to look at any one of them alone.

ABR trade-off triangle: high average bitrate, low rebuffering, and low switching pull against the operating point Figure 1. The ABR trade-off. Pulling hard toward any one corner strains the other two: chase bitrate and you risk stalls and switches; eliminate switches and you must either sit low or risk a stall. The player lives somewhere inside the triangle.

A useful analogy is driving on a road with a changing speed limit. Picture quality is your speed — faster is more fun. Rebuffering is running out of road and stopping dead. Switching is slamming the accelerator and brake over and over: even if your average speed is high, the ride is nauseating. The best drive is not the fastest top speed; it is the smoothest progress the road allows.

Average bitrate: the quality you actually delivered

The first quantity to measure is the average bitrate delivered over the session — the time-weighted mean of the rungs the player actually played, not the rungs you published. It is the headline proxy for picture quality, and standardized analytics specifications define it precisely so two dashboards report the same thing (CTA-2066, 2020). Higher average bitrate generally means a sharper picture, which is why average bitrate sits among the metrics shown to drive viewer engagement at scale (Dobrian et al., SIGCOMM 2011).

But bitrate is a proxy for quality, not quality itself, and treating the two as identical is the first place this measurement goes wrong. The same 3 Mbps looks excellent on a slow talking-head clip and mediocre on confetti and fast sports, because a hard-to-compress scene needs more bits for the same result. The honest move is to map the delivered bitrate to a perceptual picture score per segment — typically VMAF — and then pool those per-segment scores into one number. When you do, report the pooling: a mean can hide a stretch of soft segments that the viewer noticed and the average forgave. Why the same bitrate buys different quality on different content is a codec-side question; the per-title encoding article covers how the ladder is built to account for it.

So the first measurement rule is simple to state and easy to forget. Average bitrate tells you how many bits you spent; it does not tell you what the viewer saw, and it says nothing at all about how steady the experience was. For that, we need the other two quantities.

Switching is its own artifact

Here is the idea most teams miss: a quality switch is itself a visible impairment, even when every rung in it looks fine on its own. When the picture noticeably sharpens or softens mid-playback, the eye catches the change, and a stream that flickers between a great rung and a poor one can feel worse than a stream that sits steadily in the middle. Switching belongs in the same family as the streaming-specific artifacts — the visible seams that the encoder did not create and that only appear during delivery.

Two properties of a switch decide how much it hurts, and you must measure both. The first is frequency — how often the quality changes; a stream that switches every few seconds draws attention to itself. The second is magnitude — how big each jump is; stepping one rung is a shrug, while leaping from the top rung to the bottom is a slap. Subjective studies consistently find viewers are sensitive to both frequent and large switches (Jiang et al., CoNEXT 2012), so counting switch events without weighing their size measures the wrong thing.

Human perception of switching has three findings worth committing to memory, all from a large subject-rated study of 1,350 adaptive streams (Duanmu et al., WaterlooSQoE-IV, 2021). First, a switch up is forgiven far more easily than a switch down — positive adaptations are preferred over negative ones, so the direction of a switch matters, not only its size. Second, the recency effect dominates: about 60% of viewers weight the most recent quality most heavily, so a stream that ends on a high rung beats one that starts high but sags, at the same average. Third, viewers are not interchangeable — some are far more sensitive to switching, others to low quality, others to stalls, so a single QoE number is always a population average hiding real spread.

That same study delivered a humbling result for anyone who measures ABR. On a large, realistic database, several celebrated "smart" ABR algorithms scored worse on average than a plain rate-based one, and the authors concluded that understanding human perception matters more than fancier optimization (Duanmu et al., 2021). The lesson for measurement is direct: the win comes from measuring the right thing, not from a cleverer controller optimizing the wrong thing.

Measuring switching: frequency, magnitude, and instability

To put a number on switching, start with the two easy measurements and then combine them. Switching frequency is the count of rung changes per minute of playback. Switching magnitude is the size of each change, best expressed on the perceptual scale — a jump in VMAF points or in bitrate — and summarized as the mean and the maximum absolute step, or the standard deviation of the played bitrate over the session.

A single combined measure that captures both, and that the literature uses, is the instability metric from the FESTIVE study (Jiang et al., CoNEXT 2012). It is the weighted sum of the switch steps over a recent window divided by the weighted sum of the bitrates over the same window, with more recent switches weighted more heavily:

                Σ  w(d) · | b(t−d) − b(t−d−1) |
instability =  ───────────────────────────────────
                Σ  w(d) · b(t−d)

over the last k seconds, with the weight w(d) = k − d so the newest changes count most, and a default window of k = 20 seconds. Read it in plain language: it is "how big were the recent jumps, relative to the quality level we were at" — a unitless score that is zero for a perfectly stable stream and grows as switches get larger and more recent.

Walk a clean example with equal weights to see the shape, then remember the real metric leans on recent changes. Take a four-segment window. A stable stream plays 2.85, 2.85, 2.85, 2.85 Mbps: every step is zero, so its instability is 0. An oscillating stream plays 4.3, 1.2, 4.3, 1.2 Mbps — nearly the same average, 2.75 Mbps — with three steps of 3.1 Mbps each:

numerator   = 3.1 + 3.1 + 3.1            = 9.3
denominator = 4.3 + 1.2 + 4.3 + 1.2      = 11.0
instability = 9.3 / 11.0                 ≈ 0.85

Two streams, the same average bitrate, and one scores 0 while the other scores 0.85. Average bitrate called them a tie; the instability metric correctly flags the second as the worse experience. That gap is exactly the information average bitrate throws away.

Two same-mean bitrate traces: a flat stable line at instability 0 and an oscillating line at instability 0.85 Figure 2. Same average, different experience. A flat stable stream scores instability 0; an oscillating stream at nearly the same mean scores about 0.85. Perception adds the recency effect: at equal average, a stream that ends high beats one that starts high.

The trade-off, written as one equation

The cleanest way to see why all three quantities must be measured together is the standard streaming-QoE objective, the function modern ABR research optimizes against (Yin et al., SIGCOMM 2015; restated by Mao et al., SIGCOMM 2017). For a video of N segments it reads:

        N                 N                  N−1
QoE =   Σ  q(R_n)   −  μ · Σ  T_n     −       Σ   | q(R_{n+1}) − q(R_n) |
       n=1               n=1                 n=1
        └─ quality ─┘    └─ rebuffer ─┘       └──── switching penalty ────┘

Read each term in order. q(R_n) maps the bitrate of segment n to a perceived quality you add — the picture-quality reward. The middle term subtracts a penalty μ for every second of rebuffering T_n — the stall cost. The last term subtracts the size of every quality change between consecutive segments — the switching cost. The trade-off is not a vague tension; it is three terms in one sum, one added and two subtracted, and pushing the first up tends to enlarge the other two.

The penalty weights are not decoration — they encode how much each harm matters, and they were fit to data. In the linear version used by the original work, with bitrate measured in Mbps and q(R) = R, the rebuffering penalty is μ = 4.3 (Mao et al., 2017). That number carries a vivid meaning: one second of stalling costs 4.3 quality units — the entire quality contribution of one segment played at the top 4.3 Mbps rung. A single one-second rebuffer erases your best segment. This is why rebuffering is treated as the most damaging QoE event and why the buffer is the shock absorber the player protects.

A worked example: stable beats switchy

Now make the equation concrete with a six-segment window and the real ladder from the reference study — {0.3, 0.75, 1.2, 1.85, 2.85, 4.3} Mbps, four-second segments (Mao et al., 2017). Assume the network can sustain a steady mid-rung but tempts an aggressive player toward the top. No rebuffering in either case, so the middle term is zero and we can watch quality fight switching directly.

Strategy A — chase the top rung. The player oscillates 4.3, 1.2, 4.3, 1.2, 4.3, 1.2 Mbps, grabbing 4K whenever it can and dropping when it cannot:

quality   = 4.3 + 1.2 + 4.3 + 1.2 + 4.3 + 1.2          = 16.5
switching = |1.2−4.3| × 5 transitions = 3.1 × 5         = 15.5
QoE_A     = 16.5 − 0 − 15.5                             = 1.0

Strategy B — sit on the middle rung. The player holds 2.85 Mbps for all six segments:

quality   = 2.85 × 6                                    = 17.1
switching = 0  (no changes)                             = 0
QoE_B     = 17.1 − 0 − 0                                 = 17.1

Strategy B scores 17.1 against Strategy A's 1.0 — seventeen times the experience — even though Strategy A reached the top rung three times and Strategy B never did. The switching penalty consumed almost everything the high rung earned. And notice the averages: Strategy A's mean bitrate is 16.5 ÷ 6 = 2.75 Mbps, slightly below Strategy B's 2.85, so on average bitrate alone Strategy B already wins narrowly — but the QoE gap is enormous, because average bitrate cannot see the switching at all.

Push it further to kill the obvious objection. Suppose Strategy A′ plays 4.3, 4.3, 1.2, 4.3, 4.3, 1.2 Mbps — a higher average of 3.27 Mbps, beating Strategy B's 2.85. Its quality term is 19.6 and its switching penalty is 9.3 (three transitions of 3.1), so QoE = 19.6 − 9.3 = 10.3, still well under Strategy B's 17.1. A strategy with a higher average bitrate lost on perceived quality because it switched. That is the entire argument for measuring switching, in one line of arithmetic.

Two timelines: an oscillating high-bitrate strategy scores QoE 1.0; a stable mid-bitrate strategy scores 17.1 Figure 3. Two strategies over six segments. The oscillating strategy touches the top rung three times yet scores QoE 1.0; the stable mid-rung strategy never leaves 2.85 Mbps and scores 17.1. The difference is the switching penalty.

How ABR decisions show up in a quality score

The worked QoE number above is a research objective; the standardized, validated way to fold these decisions into a perceived score is ITU-T P.1203, the parametric model that estimates a session's mean opinion score on a 1-to-5 scale (ITU-T Rec. P.1203, 2017). It takes the per-segment quality, the startup delay, the stalls, and the quality switches and integrates them over time into one number. The standard states the point plainly: adaptation improves the chance of uninterrupted delivery, but it creates quality variations that can disturb the viewer — so switching is modeled as a degradation, not a free lunch.

QoE objective: a quality reward added, minus a rebuffering penalty and a switching penalty, feeding a session score Figure 4. Anatomy of the QoE objective. A quality reward (added) minus a rebuffering penalty and a switching penalty (both subtracted) produce the session score. Maximizing the first term tends to grow the two you subtract.

P.1203 also bakes in the temporal lesson from perception research: the model's integration step weights recent quality more heavily, mirroring the recency effect — viewers remember how the stream ended. The practical takeaway for an ABR design or a quality target is to end on a high rung and switch up rather than down where you have the choice, because the same set of rungs, reordered, produces a different remembered quality. Where the switch events themselves come from — the player instrumentation that timestamps every rung change — is the subject of player-side quality metrics.

Common mistakes

The trade-off is forgiving of almost everything except measuring one corner and ignoring the other two. The pitfalls below are the ones that recur.

Optimizing average bitrate alone. A player tuned only to raise average bitrate will switch aggressively to grab high rungs, and the switching penalty can sink the perceived score below a calmer player's — as the worked example showed at a higher average. Always report switching next to average bitrate.

Counting switches without their magnitude. "Twelve switches" means nothing without their sizes and directions. Twelve one-rung steps up are nearly invisible; one top-to-bottom drop is jarring. Measure magnitude (VMAF or bitrate delta) and direction, not just a count.

Treating bitrate as quality. The same bitrate is sharp on simple content and soft on complex content. Map delivered bitrate to a per-segment VMAF score before you call it "quality", and report the pooling.

Mean-pooling that hides a valley. A session average can conceal a stretch of low rungs the viewer clearly saw. Report a low percentile alongside the mean, the same floor-not-average discipline that drives a trustworthy QC report.

Comparing ABR strategies on a handful of clips. Two or three hand-picked clips and a casual look can crown the wrong algorithm; on a large, realistic set the ranking often flips (Duanmu et al., 2021). Compare on many contents, networks, and devices, and validate against subjective scores.

Ignoring viewer heterogeneity and device. Switching that is invisible on a phone can be obvious on a TV, and viewers differ in what bothers them. A single QoE number is a population mean; note the spread and the device.

Where Fora Soft fits in

We build streaming, OTT, video conferencing, and e-learning platforms where the question is rarely "what is the peak bitrate" and almost always "does the experience hold steady across real networks". Measuring the ABR trade-off in full — average bitrate mapped to a perceptual score, plus switching frequency and magnitude, plus rebuffering, read together rather than one at a time — is how we keep a player from optimizing a number the viewer never feels. When we compare two adaptive strategies for a client, we score them with the full QoE objective on a content and network set that resembles their audience, not on a flattering demo clip. The same measurement-honest stance runs through our benchmark methodology, where every comparison carries its content, settings, and date.

What to read next

Call to action

References

  1. X. Yin, A. Jindal, V. Sekar, B. Sinopoli, "A Control-Theoretic Approach for Dynamic Adaptive Video Streaming over HTTP" (MPC), ACM SIGCOMM, 2015. The defining streaming-QoE objective: quality utility minus a rebuffering penalty minus a smoothness (switching) penalty. Tier 5 (peer-reviewed). https://users.ece.cmu.edu/~vsekar/assets/pdf/sigcomm15_mpc.pdf
  2. H. Mao, R. Netravali, M. Alizadeh, "Neural Adaptive Video Streaming with Pensieve," ACM SIGCOMM, 2017. Restates the QoE objective (Eq. 6), the three conflicting goals (max bitrate, min rebuffering, max smoothness), the QoE_lin/log/hd variants, the μ = 4.3 rebuffering penalty, and the {0.3, 0.75, 1.2, 1.85, 2.85, 4.3} Mbps reference ladder. Tier 5. https://people.csail.mit.edu/hongzi/content/publications/Pensieve-Sigcomm17.pdf
  3. J. Jiang, V. Sekar, H. Zhang, "Improving Fairness, Efficiency, and Stability in HTTP-based Adaptive Video Streaming with FESTIVE," ACM CoNEXT, 2012 (extended in IEEE/ACM Trans. Networking, 2014). Defines the instability metric — the weighted sum of recent switch steps divided by the weighted sum of recent bitrates, with weight w(d) = k−d over a 20-second window — and shows viewers are sensitive to frequent and large switches. Tier 5. https://users.ece.cmu.edu/~vsekar/assets/pdf/conext12_festive.pdf
  4. Z. Duanmu, W. Liu, Z. Li, D. Chen, Z. Wang, Y. Wang, W. Gao, "Assessing the Quality-of-Experience of Adaptive Bitrate Video Streaming" (WaterlooSQoE-IV, 1,350 subject-rated streams), 2021. Source for the recency effect (≈60% of viewers recency-dominated; end-high beats start-high at equal average), positive-over-negative adaptation preference, ≈90% preferring a switch-up to staying low, user heterogeneity, and the finding that simpler ABR often wins on a realistic database. Tier 5. https://arxiv.org/abs/2008.08804
  5. Recommendation ITU-T P.1203: Parametric bitstream-based quality assessment of progressive download and adaptive audiovisual streaming services over reliable transport, ITU-T, 2017. Official standard. Integrates per-segment quality, initial loading delay, stalls, and quality switches into a 1–5 session MOS; models adaptation/switching explicitly as a degradation and weights recent quality more heavily. Tier 1. https://www.itu.int/rec/T-REC-P.1203
  6. Streaming Quality of Experience Events, Properties and Metrics (CTA-2066), Consumer Technology Association, 2020. Recommended-practice standard defining the streaming-QoE metrics — including average bitrate and switching-related metrics — and how each is computed for consistent cross-vendor reporting. Tier 1. https://shop.cta.tech/products/streaming-quality-of-experience-events-properties-and-metrics-cta-2066
  7. Web Application Video Ecosystem — Common Media Client Data (CMCD) (CTA-5004), Consumer Technology Association, 2020 (CMCDv2 / CTA-5004-A, 2026). Standard. The key-value fields a player attaches to each media request, enabling reconstruction of the bitrate and switch sequence at the backend. Tier 1. https://cdn.cta.tech/cta/media/media/resources/standards/pdfs/cta-5004-final.pdf
  8. Recommendation ITU-R BT.500-15: Methodologies for the subjective assessment of the quality of television pictures, ITU-R, 2023. Official standard. The subjective ground truth every QoE model and objective metric is validated against; when a model and a careful viewing disagree, the viewing wins. Tier 1. https://www.itu.int/rec/R-REC-BT.500
  9. T.-Y. Huang, R. Johari, N. McKeown, M. Trunnell, M. Watson, "A Buffer-Based Approach to Rate Adaptation: Evidence from a Large Video Streaming Service" (BBA), ACM SIGCOMM, 2014. Frames the playback buffer as the shock absorber that lets a player chase higher bitrate without stalling — the mechanism behind the bitrate-vs-rebuffering half of the trade-off. Tier 5. https://yuba.stanford.edu/~nickm/papers/sigcomm2014-video.pdf
  10. F. Dobrian, V. Sekar, A. Awan, I. Stoica, D. Joseph, A. Ganjam, J. Zhan, H. Zhang, "Understanding the Impact of Video Quality on User Engagement," ACM SIGCOMM, 2011. Large-scale (Conviva) study placing average bitrate among the quality metrics that affect engagement, with the buffering ratio carrying the largest single effect. Tier 5. https://www.cs.cmu.edu/~hzhang/papers/sigcomm2011_QualityEngagement.pdf

Where lower-tier sources disagreed with the standards, the standards won: popular ABR write-ups frame "hit the highest bitrate" as the goal, while ITU-T P.1203 and the QoE objective (Yin 2015) both penalize the switching that chasing the top rung causes — so this article followed the standard and the peer-reviewed objective, not the marketing framing.