In a pull protocol the consumer is in control. The player or downstream server opens a TCP connection to a URL and reads bytes; the server replies with whatever segment, manifest or RTP packet is requested. Pull is the natural model for distribution to millions of players because each player is a unique consumer and the server only has to wait. HLS and DASH are pull at their core — even live ones, where the player keeps refreshing the manifest and pulling new segments as they appear.

Pull has two big operational advantages over push at delivery scale. First, every HTTP cache on the path can answer the pull, so CDNs trivially fit in front of the origin. Second, the server is stateless per-viewer — it never tracks which client got which byte, because each request is self-contained. Push pipelines have to track sessions, which is harder to scale and harder to fail over between regions.

Pull's weakness is latency. Each segment must be published before the player can request it, and the player adds round-trips for each fetch. LL-HLS and LL-DASH attack this with HTTP/2 chunked transfer and blocking playlist reload, but pull will always lose to push for the very lowest latencies — which is why WebRTC and WHEP, although technically driven by an HTTP handshake, behave like push-style streaming once the SCTP/SRTP channel is up.