Why this matters

If you own a product that delivers video — a learning platform, a tele-medicine service, an OTT app, a security camera grid — ABR is the difference between "video works for our users" and "users complain in the App Store about buffering". Marketing, product, and finance teams meet ABR through three of its consequences: the streaming bill, the rebuffer rate, and the start-up time. Engineers meet it through a long list of parameters with no obvious right value. Both audiences need the same mental model to make good decisions, and that model is what this article hands you.

What ABR is, in one sentence

The number that tells the player how many bits to download every second to keep video playing, called the bitrate, is not a constant. ABR is the agreement that the player will choose, every few seconds, the bitrate that gives the highest quality the network can still deliver in time. Apple calls this HTTP Live Streaming with multi-variant playlists (RFC 8216, §4.3.4.2). The MPEG standard calls it Dynamic Adaptive Streaming over HTTP, abbreviated DASH (ISO/IEC 23009-1:2022). Different names, same idea: a single video is encoded once into multiple versions, the player downloads from one version at a time, and the player is allowed to switch versions on any segment boundary.

The thing that makes ABR different from "the user picks 720p" or "the server detects the device and serves one version" is the switching. The player is the decision-maker. Every two to six seconds, the player asks a single question — am I downloading fast enough? — and either holds its ladder rung or moves up or down one rung.

The four components of any ABR system

Every ABR system, regardless of protocol or vendor, has the same four moving parts. Recognise them and the rest of the article reads itself.

1. The bitrate ladder. A list of pre-encoded versions of the same content, each at a different bitrate and resolution. A typical ladder for a 1080p VoD title has 6–9 rungs: 235 kbps at 416×234, 375 kbps at 640×360, 560 kbps at 768×432, 750 kbps at 960×540, 1050 kbps at 1280×720, 1750 kbps at 1280×720, 2350 kbps at 1920×1080, 3000 kbps at 1920×1080, 4500 kbps at 1920×1080. These numbers come from Apple's HLS Authoring Specification (revision 2025-09, §2.7); the actual values are nudged per service.

2. The packager. The piece of software that chops each encoded version into short segments — typically 2, 4, or 6 seconds long — and writes a manifest that lists which segments belong to which rung. For HLS the manifest is a multi-variant .m3u8 playlist; for DASH it is a single .mpd file. Shaka Packager and Bento4 are the two open-source defaults; AWS MediaPackage, Unified Streaming, and Wowza ship commercial alternatives.

3. The manifest. The text file the player downloads first. It lists every rung, every codec, every audio and subtitle track, and the URLs of the segments. The manifest is small (a few kilobytes) but it is the contract: the player is allowed to use any combination the manifest advertises, and nothing else.

4. The ABR algorithm. The code inside the player that decides which rung to download next. This is the part that gets the academic papers and the patent wars. Production players use three families of algorithm, covered below.

How a single ABR session unfolds

Walk through a real session, second by second, to make the four parts concrete.

The user taps Play. The player downloads the manifest — usually 5–15 kilobytes — and parses it. The manifest tells the player there are six rungs available, from 400 kbps to 5,000 kbps, and that each segment is four seconds long. The player picks a starting rung. Most players start one or two rungs below the middle of the ladder — Apple's reference implementation starts at the rung whose bitrate is closest to the player's most recent measured throughput, with a small bias toward "lower". The HLS Authoring Specification (rev 2025-09, §4.7.5) recommends that the initial selection target a rung the device can sustain.

The player downloads segment 1 of the chosen rung. Say the segment is 4 seconds of video at 1,750 kbps, so the segment is roughly 875 kilobytes (1,750 kbps × 4 s ÷ 8 bits/byte). On a 6 Mbps connection the segment arrives in about 1.2 seconds — much faster than the four seconds of playback it contains. The download speed (6 Mbps) is comfortably above the rung's bitrate (1.75 Mbps), so the player moves up one rung for segment 2.

The player downloads segment 2 at 2,350 kbps. The download still finishes well before playback catches up; the buffer — the bank of pre-downloaded segments sitting ahead of the play-head — grows from 4 seconds to 8 seconds. The player moves up another rung for segment 3, and another for segment 4.

At segment 5 the player has reached the 4,500 kbps rung. The math now changes: each 4-second segment is 2,250 kilobytes. On the same 6 Mbps line, the download takes about 3 seconds — still less than 4, but the margin has shrunk. The player holds this rung.

Then the elevator door opens and the train pulls out of the station. The user's phone hands off from Wi-Fi to LTE. The measured throughput drops from 6 Mbps to 1.8 Mbps. The next segment at 4,500 kbps would take 10 seconds to download — more than the buffer's depth. The player sees this trend in the running average and steps down two rungs to 1,750 kbps for the next segment. Quality dips for a few seconds; the buffer survives; the user keeps watching.

That is ABR. Everything else in this article is detail on how the player makes the "throughput dropped — step down" decision well or badly.

End-to-end ABR session diagram showing manifest fetch, segment selection by rung, buffer growth, and a switch-down on network drop Figure 1. One ABR session, second by second. The buffer is the player's safety margin against network shocks.

The bitrate ladder, in concrete numbers

The ladder is the menu the player chooses from. Three rules govern a good ladder.

Rule 1 — Geometric spacing. Adjacent rungs are roughly 1.5× apart in bitrate, not equally spaced. Going from 400 kbps to 600 kbps is a 50% jump that the user can see; going from 4,000 kbps to 4,200 kbps is a 5% jump that mostly wastes encoder time. Apple's recommended ladder (HLS Authoring Specification rev 2025-09, §2.7) has rungs at 235 / 375 / 560 / 750 / 1,050 / 1,750 / 2,350 / 3,000 / 4,500 kbps — each ratio between 1.4× and 1.6×.

Rule 2 — Resolution moves with bitrate. A 400 kbps stream at 1920×1080 looks worse than a 400 kbps stream at 640×360. There is a "knee" above which adding pixels stops adding visible quality. Modern ladders cluster two or three bitrates at the same resolution near the top, then drop resolution at the bottom. Netflix popularised this discipline; their public engineering blog calls the technique per-title encoding because the exact knee depends on the content.

Rule 3 — Maximum 1.5× steps when switching. If the player needs to drop, it should drop one rung at a time when possible. Skipping from 4,500 kbps straight to 750 kbps produces a visible jolt; dropping to 3,000, then 1,750 if needed, hides the change better. Some algorithms skip steps to prevent the buffer from emptying; that is the right move when the network has truly collapsed.

A first concrete ladder for a 1080p VoD title looks like this. Audio is in a separate track at 128 kbps.

RungBitrateResolutionCodecNotes
1400 kbps480×270H.264 baselineMobile fallback
2750 kbps640×360H.264 mainSlow Wi-Fi
31,200 kbps854×480H.264 mainGood Wi-Fi, small device
42,000 kbps1280×720H.264 highHD on a phone
53,500 kbps1920×1080H.264 highHD on a laptop
65,000 kbps1920×1080H.264 highHD on a fast line
78,000 kbps1920×1080H.2651080p archival quality
The exact numbers shift per title, per scene, and per codec — a topic the next article in this block, Building a Bitrate Ladder, covers in depth.

The three algorithm families

Every ABR algorithm in production today belongs to one of three families. They differ in what signal they trust most.

Family 1 — Throughput-based ABR

The simplest family. The player keeps a running estimate of recent download throughput — usually the harmonic mean of the last three to five segments. To pick the next rung, the player divides the estimate by a safety factor (typically 1.2 to 1.5) and selects the highest rung whose bitrate is below that ceiling.

The math, with numbers: the last three segments downloaded at 2.8, 3.1, and 2.5 Mbps. The harmonic mean is 3 ÷ (1/2.8 + 1/3.1 + 1/2.5) ≈ 2.78 Mbps. Divide by a safety factor of 1.25: 2.22 Mbps. The highest rung whose bitrate is at or below 2.22 Mbps is rung 5 at 2,000 kbps. The player downloads that.

Throughput-based ABR is the default in hls.js, in Apple's iOS native player, and in most "v1" implementations. It is fast to ship, easy to reason about, and behaves predictably under stable networks. It fails under bursty networks — Wi-Fi where the throughput swings every few seconds — and it under-uses the buffer.

Family 2 — Buffer-based ABR

A more conservative family. The player ignores throughput estimates and looks only at the current buffer level — how many seconds of pre-downloaded video sit ahead of the play-head. The rules are:

  • Buffer below a threshold T_low (e.g., 5 seconds) → drop to a low rung to refill the buffer.
  • Buffer between T_low and T_high (e.g., 5–25 seconds) → linearly interpolate between low and high rungs.
  • Buffer above T_high → take the highest rung the ladder offers.

The canonical paper is BOLABuffer Occupancy based Lyapunov Algorithm, Spiteri, Urgaonkar, and Sitaraman, INFOCOM 2016 — which proved a clean mathematical bound on how often a BOLA player rebuffers given a target buffer depth. BOLA ships in dash.js as the default since 2018 and in Shaka Player as one of two algorithm options.

Buffer-based ABR handles bursty networks well because it doesn't react to instantaneous throughput dips — the buffer absorbs them. It under-uses bandwidth when the buffer is full at the top rung, and it is slower to react when the network actually improves.

Family 3 — Hybrid ABR

The family that ships in most real-world top-tier players. It combines a throughput estimator and a buffer model and picks the rung that maximises a utility function — usually a weighted sum of "quality" (higher rung) and "stability" (no switches and no rebuffers).

The two most-cited hybrid algorithms are Model Predictive Control (MPC) — Yin, Jindal, Sekar, Sinopoli, SIGCOMM 2015 — and FESTIVE — Jiang, Sekar, Zhang, CoNEXT 2012. Production players that publish their algorithms cite these papers as ancestors. Netflix's own player uses a proprietary hybrid; YouTube's player likewise. Mux's player and LiveKit's web player both use Shaka Player's hybrid mode.

The trade-off, restated for non-technical readers: throughput-based is simple and fast but jittery on flaky networks. Buffer-based is smooth but slow to climb. Hybrid is slow to ship, hard to debug, best for users.

A fourth family — neural ABR — uses a learned policy network instead of a hand-tuned formula. Pensieve (Mao, Netravali, Alizadeh, SIGCOMM 2017) was the first widely cited example, with later work like Comyco and Kairos pushing the state of the art. Neural ABR is not yet a 2026 production default outside of two or three top streamers, but it is the direction of travel. The next-block article Neural and Learning-Based ABR covers this in depth.

Side-by-side comparison of throughput-based, buffer-based, and hybrid ABR families showing input signals and decision flow Figure 2. Three algorithm families, one decision per segment. Each reads a different signal first.

The numbers that matter to the business

Five quality-of-experience metrics, each tied to ABR behaviour, drive every streaming product's economics.

Start-up time — seconds from "user tapped Play" to "first frame on screen". Industry leaders sit at 1.0–1.5 s; the median streaming app sits at 3–5 s. ABR contributes by choosing the starting rung — too high and the first segment takes long to arrive; too low and the user sees pixelation in the first seconds.

Rebuffer ratio — fraction of viewing time spent showing the buffering spinner instead of video. Top-tier services target below 0.4%; Conviva's 2024 State of Streaming put the global median around 1.2%. ABR contributes by choosing rungs the network can actually deliver.

Average bitrate — mean bitrate of video that was actually played. Higher is better, up to the screen's diminishing-returns point. ABR contributes by not playing at the bottom rung when the network supports more.

Switches per minute — how many times per minute the player changed rung. Each switch is a small jolt. Top-tier services aim for fewer than one switch per two minutes.

Quality-adjusted bitrate — a derived metric that combines bitrate and switches, often weighted toward stability. This is what hybrid ABR algorithms optimise directly.

These five sit at the heart of the Player Observability and Metrics article in Block 7.

A numeric example, end to end

Concrete arithmetic anchors every claim above. Consider a 10-minute VoD with a six-rung ladder (400, 750, 1,500, 2,500, 4,000, 6,000 kbps), 4-second segments, on a viewer whose network delivers 3.2 Mbps sustained with one 30-second dip to 1 Mbps in the middle.

Total segments = 600 ÷ 4 = 150 segments.

If the player held the 4,000 kbps rung the entire time, total bytes = 4,000 × 600 ÷ 8 = 300,000 kilobytes = 300 MB. The total download capacity at 3.2 Mbps over 600 s = 3,200 × 600 ÷ 8 = 240,000 kilobytes = 240 MB. The player would rebuffer for 60 seconds — a 10% rebuffer ratio. Unacceptable.

If the player held the 2,500 kbps rung, total = 187.5 MB, well within 240 MB capacity. The dip to 1 Mbps for 30 s costs roughly (2,500 − 1,000) × 30 ÷ 8 = 5,625 kilobytes of buffer drain. If the buffer was at 30 s × 2,500 kbps ÷ 8 = 9,375 kilobytes when the dip started, the buffer survives.

A throughput-based player would have stayed at 2,500 kbps most of the time, dipped to 750 kbps during the network dip, and recovered to 2,500. Average bitrate ≈ 2,400 kbps. Switches ≈ 2. Rebuffer = 0.

A buffer-based player would have probed upward when the buffer grew, possibly touching 4,000 kbps for 20–30 s before the dip, then dropping to 1,500 kbps when the buffer drained. Average bitrate ≈ 2,600 kbps. Switches ≈ 3. Rebuffer = 0.

A hybrid player would have moved similarly to buffer-based but with smoother switching. Average bitrate ≈ 2,650 kbps. Switches ≈ 2. Rebuffer = 0.

The win is not large in this scenario — a stable network is forgiving. The win is largest on the unstable mobile and Wi-Fi networks that account for most real-world viewing.

Live vs VoD — what ABR has to do differently

ABR in video-on-demand has time. The whole asset is already encoded and packaged; the player can probe upward freely because no segment has a deadline. ABR in live streaming has no such luxury — the encoder is producing segments in real time, and the player has to stay close to the live edge or fall behind.

The differences that matter:

  • Buffer depth is smaller. A VoD player can hold 60 seconds of buffer; a low-latency live player aims for 3–6 seconds at most.
  • Switching is more conservative. Falling off the live edge looks worse than a quality dip.
  • The starting rung is lower. A live player would rather start at 720p and stabilise than start at 1080p and rebuffer.
  • CMAF chunked transfer changes the math. With Common Media Application Format (ISO/IEC 23000-19) and chunked CMAF, a segment can begin streaming to the player while it is still being encoded. The throughput estimator has to ignore the encoder's pacing — which arrives at exactly the bitrate of the rung — and measure only network throughput. This is one of the harder bugs in low-latency player engineering.

The LL-HLS Deep Dive and LL-DASH and CMAF Chunked articles cover this in detail.

Common mistakes that ruin ABR

Five mistakes account for most of the rebuffer pain in production streaming services. Watch for them.

Pitfall 1 — Too few rungs. A three-rung ladder forces the player into ugly trade-offs. On the way up, each rung is a big visible jump; on the way down, the player either over-corrects or rebuffers. Six to nine rungs is the sweet spot for a 1080p title.
Pitfall 2 — A "vanity" top rung. A 10 Mbps rung that no one's network can sustain is a CDN cost trap: a few percent of viewers will reach it momentarily, generate a rebuffer, and force the player back down. If fewer than 5% of viewers can sustain a rung, retire it.
Pitfall 3 — Ignoring the start-up rung choice. Players that always start at the lowest rung look bad in the first three seconds — the period the viewer's brain uses to judge whether the service is "good". Most users decide to keep watching or close the app inside the first ten seconds.
Pitfall 4 — Trusting the bandwidth estimator on the first segment. No estimator works with zero data. Use a reasonable default (last-session throughput, or a conservative middle rung) until the player has three or four segments to average.
Pitfall 5 — Not measuring switches. Average bitrate looks fine if the player oscillates wildly; the user sees the oscillation, not the average. Switch count per minute is a first-class metric. Track it.

Where Fora Soft fits in

We've shipped 239+ streaming products since 2005. ABR matters in every vertical we touch — OTT and Internet-TV apps where the bitrate ladder defines the CDN bill; e-learning platforms where switches during a lecture pull a student's attention away from the speaker; telemedicine consultations where a quality drop on a clinician's screen during a diagnostic question is a clinical risk; security and surveillance grids where dozens of streams share a single uplink and ABR is the only thing that keeps them all alive. We tune ladders, pick algorithms, and rebuild player observability for each project — and the same five mistakes above are the ones we see most often when we inherit someone else's stack.

What to read next

CTA

  • Talk to a streaming engineer — book a 30-minute scoping call with our streaming team.
  • See our case studies — read how we built ABR for OTT, e-learning, telemedicine, and surveillance clients.
  • Download: The ABR Tuning Checklist — a one-page audit covering ladder design, algorithm choice, start-up rung, and observability. Download the checklist.

References

  1. IETF RFC 8216 — HTTP Live Streaming (Pantos, May 2017). Multi-variant playlists, §4.3.4.2. https://www.rfc-editor.org/rfc/rfc8216
  2. ISO/IEC 23009-1:2022 — Information technology — Dynamic adaptive streaming over HTTP (DASH) — Part 1: Media presentation description and segment formats. Fifth edition, the controlling DASH document.
  3. Apple — HTTP Live Streaming (HLS) Authoring Specification for Apple Devices, revision 2025-09. §2.7 (bitrate tiers) and §4.7.5 (initial selection). https://developer.apple.com/documentation/http-live-streaming/hls-authoring-specification-for-apple-devices
  4. ISO/IEC 23000-19:2024 — Information technology — Multimedia application format (MPEG-A) — Part 19: Common media application format (CMAF) for segmented media. Fourth edition.
  5. K. Spiteri, R. Urgaonkar, R. K. Sitaraman — BOLA: Near-Optimal Bitrate Adaptation for Online Videos, IEEE INFOCOM 2016. https://arxiv.org/abs/1601.06748
  6. X. Yin, A. Jindal, V. Sekar, B. Sinopoli — A Control-Theoretic Approach for Dynamic Adaptive Video Streaming over HTTP (MPC), ACM SIGCOMM 2015. https://dl.acm.org/doi/10.1145/2785956.2787486
  7. J. Jiang, V. Sekar, H. Zhang — Improving Fairness, Efficiency, and Stability in HTTP-based Adaptive Video Streaming with FESTIVE, ACM CoNEXT 2012. https://dl.acm.org/doi/10.1145/2413176.2413189
  8. H. Mao, R. Netravali, M. Alizadeh — Neural Adaptive Video Streaming with Pensieve, ACM SIGCOMM 2017. https://dl.acm.org/doi/10.1145/3098822.3098843
  9. Conviva — State of Streaming Q4 2024. Rebuffer-ratio benchmarks. https://www.conviva.com/state-of-streaming
  10. DASH Industry Forum — DASH-IF Implementation Guidelines: Restricted Timing Model (v5.x, 2024). https://dashif.org/guidelines
  11. Bitmovin — Video Developer Report 2024. Algorithm-family deployment numbers.
  12. dash.js GitHub — BOLA implementation, since v2.7 (2018). https://github.com/Dash-Industry-Forum/dash.js