Why This Matters

The TCP-versus-UDP question is the first architectural decision in any streaming product, and it propagates everywhere: it sets the floor of latency you can ever hit, it decides whether a CDN can cache your traffic, it changes how you handle firewalls and corporate networks, and it dictates how much money you spend on TURN bandwidth for WebRTC. A product team that does not understand the difference will pick the wrong protocol for the use case — usually HLS for a real-time auction or WebRTC for a 200,000-viewer concert — and discover the mistake three months in, when the architecture cannot stretch to fit. The same misunderstanding shows up in vendor RFPs that ask for "low-latency streaming over HTTP" without acknowledging that the laws of physics on TCP put a floor under how low that latency can go. This article gives you the four facts that resolve the question for every streaming protocol you will ever evaluate: what each transport actually guarantees, which streaming use cases each one fits, the head-of-line blocking trap, and the way QUIC is starting to dissolve the old TCP-versus-UDP boundary in 2026.


What a transport protocol actually does

Before TCP and UDP, the layer below them. The Internet Protocol (IP) hands a packet to a destination address and shrugs — it does not guarantee delivery, ordering, or that the packet you sent is the packet that arrives. A transport protocol is the layer that sits on top of IP and decides what to do about that.

Every transport protocol answers six questions about every byte you hand it: does the byte have to arrive at all (reliability), in what order do bytes have to arrive (ordering), how do we react when the network is congested (congestion control), how do we tell the sender to slow down when the receiver cannot keep up (flow control), how do we identify which application gets which packet (port multiplexing), and how do we know two endpoints are really talking to each other (connection state). TCP answers all six. UDP answers exactly one — port multiplexing — and leaves the other five to the application.

That gap between "answers six" and "answers one" is the whole story.

Three-layer stack diagram showing the Internet Protocol layer at the bottom delivering best-effort packets, the transport layer in the middle split into a TCP column with six guarantees and a UDP column with only port multiplexing, and the application layer on top showing streaming protocols mapped to whichever transport they sit on Figure 1. The transport-layer split. TCP answers six questions; UDP answers one. Every streaming protocol picks one column or the other, and modern protocols increasingly use QUIC to re-bundle TCP's guarantees on top of UDP.

TCP, in one paragraph

The Transmission Control Protocol, defined in IETF RFC 793 (September 1981) and modernised in RFC 9293 (August 2022, which formally obsoletes RFC 793), is a connection-oriented, reliable, in-order byte-stream transport. Before any data flows, the two endpoints exchange three messages (SYN, SYN-ACK, ACK) to agree they are connected and to synchronise initial sequence numbers — the three-way handshake. Once connected, the sender hands TCP a stream of bytes; TCP cuts the stream into segments, numbers each byte, sends them in IP packets, and waits for the receiver to acknowledge (ACK) each contiguous range. If an ACK does not come back within the calculated round-trip-timeout, TCP retransmits. The receiver delivers bytes to the application strictly in order — even if byte 1000 arrives before byte 500, the application sees nothing until 500 arrives. TCP also runs a congestion-control algorithm (CUBIC, BBR, and Reno are the deployed ones in 2026) that probes the network's available bandwidth and shrinks the sender's window after a loss. The result, from the application's point of view, is a perfect ordered pipe — every byte you send arrives once, in order, eventually.

The "eventually" is where streaming hurts.

UDP, in one paragraph

The User Datagram Protocol, defined in IETF RFC 768 (August 1980), is a connectionless, unreliable, message-oriented transport. There is no handshake; you hand UDP a datagram (a single message of up to 65,507 bytes after IP and UDP headers), it slaps an 8-byte UDP header on it, hands it to IP, and walks away. There is no ACK, no retransmission, no ordering guarantee, no congestion control, no flow control. If the datagram is lost, you do not find out from UDP — your application must detect the loss and decide what to do. If two datagrams arrive in reverse order, they arrive in reverse order. If the network is congested and the receiver is drowning, UDP will keep firing at the same rate. UDP gives the application four things: a destination port, a source port, a length, and a checksum. Everything else is the application's problem. RFC 8085 (UDP Usage Guidelines, March 2017) is a long document whose entire purpose is to explain to application designers what TCP does for free and how to build it on top of UDP without melting the network.

That sounds catastrophic for video. It is, in fact, perfect for video — once you understand head-of-line blocking.

Head-of-line blocking: the one fact that decides everything

If you read only one paragraph of this article, read this one. TCP delivers bytes to the application strictly in order. That guarantee is implemented by buffering. When TCP receives bytes 1–499 and 501–1000 but not byte 500, it does not deliver any of them to the application until byte 500 arrives — it holds bytes 501–1000 in a kernel buffer and waits for a retransmission of byte 500. The application sees nothing. This delay is called head-of-line blocking. It is fundamental to TCP and you cannot turn it off.

For a web page, head-of-line blocking is a non-event. The application wants every byte anyway, and a 200-millisecond delay while TCP retransmits a lost packet is invisible against the 2-second time-to-first-paint of a typical page. For video on a 5-second latency target, head-of-line blocking is also tolerable — players carry a buffer measured in seconds, and a retransmission completes inside that buffer.

For video on a sub-second latency target, head-of-line blocking is the entire problem. Consider a video conference at 30 frames per second: each frame is 33 milliseconds long. If a single packet carrying part of frame 100 is lost on the network, TCP will not deliver frame 101, frame 102, frame 103 or anything after it to the receiver until frame 100 is retransmitted. A typical retransmission round-trip on a transcontinental link is 80–200 milliseconds — three to six frames stalled to recover one. The receiver freezes for a tenth of a second, then unfreezes with a perfect ordered sequence of frames it can no longer use because the conversation has moved on.

UDP just delivers frame 101 the moment it arrives, even if frame 100 is missing. The video pipeline says "I lost a packet of frame 100, I will conceal it by interpolating from frame 99, the next I-frame will fix any compounding artifact within 2 seconds anyway" and keeps playing. The decoder's loss-concealment is far less disruptive than a freeze.

This is why every real-time streaming protocol sits on UDP. The protocol's job is to make UDP less bad — to add the right kind of loss recovery, the kind that does not block the head of the line.

Two-track timeline diagram showing the same five-frame burst delivered over TCP on the top track and over UDP on the bottom track, with frame 100 lost in transit; the TCP track shows the receiver stalled at frame 99 while TCP retransmits frame 100, then a delayed catch-up of frames 100 to 103; the UDP track shows the receiver playing frame 101 the moment it arrives, with frame 100 concealed by error concealment Figure 2. Head-of-line blocking in one picture. TCP holds frames 101–103 in a kernel buffer until the retransmission of the lost frame 100 completes — the player freezes. UDP delivers frames out of order and the video decoder conceals the missing frame; playback continues.

TCP versus UDP — the comparison table you can hand to your team

Six dimensions, two columns. Memorise the table; almost every streaming protocol-choice debate fits inside it.

DimensionTCPUDP
Connection setup3-way handshake (1 round-trip), plus TLS handshake on top (1–2 more round-trips)None. First datagram carries the first byte of payload.
ReliabilityGuaranteed delivery via ACK + retransmitBest-effort. Application handles loss.
OrderingStrict in-order delivery (head-of-line blocking)None. Datagrams may arrive in any order.
Congestion controlMandatory. CUBIC, BBR, Reno deployed in 2026.None. Application must implement (RFC 8085).
Flow controlReceive window prevents receiver overflowNone. Application's problem.
Header overhead20 bytes minimum (often 32+ with options)8 bytes
Latency floor under loss1 round-trip per lost packet, blocks the stream0 — receiver gets next packet immediately
Typical streaming useHLS, DASH, RTMP delivery (now legacy), file downloadsWebRTC, SRT, RIST, RTP, MoQ over QUIC
The dimension that decides streaming architecture is the bottom-but-one: latency floor under loss. On a network with even 1% packet loss, TCP's effective throughput drops by an order of magnitude because of retransmission stalls — Mathis et al.'s 1997 formula, still the back-of-envelope reference, gives throughput ≤ MSS / (RTT × √loss). Plug in MSS = 1460 bytes, RTT = 80 ms, loss = 1%: throughput caps at roughly 1.8 Mbps. That is not enough for 1080p. UDP, with the application doing its own loss recovery, does not care about that formula.

Why HLS and DASH live happily on TCP

If TCP is so bad for low latency, why do the two most popular streaming protocols on Earth — HTTP Live Streaming and Dynamic Adaptive Streaming over HTTP — both sit on TCP?

The answer is that HLS and DASH are not real-time protocols. They are file-shaped protocols. The encoder cuts the live stream into segments of 2–10 seconds each, writes those segments as files (.ts, .mp4, or .m4s) into an origin, and lets the player fetch them over plain HTTP. HTTP runs on TCP. The player keeps a buffer of, typically, 3–4 segments — at 6-second segments, that is an 18–24 second buffer. A retransmission stall of 200 milliseconds inside that buffer is invisible.

That buffer is exactly the latency you trade for TCP's reliability and HTTP's caching. The buffer is what lets a CDN cache every segment, deliver it from the edge, and serve a million concurrent viewers without melting the origin. Without that buffer, the CDN cannot cache; without caching, you cannot scale; without scale, you cannot do OTT.

Low-Latency HLS and Low-Latency DASH push the buffer down by publishing partial segments of 200–400 ms each — Apple's HLS Authoring Specification (revision 2025-09, §2.10 onwards) describes the LL-HLS extensions in detail. The result is sub-3-second glass-to-glass on a well-tuned pipeline, still on TCP. Below 3 seconds, TCP starts to lose; below 1 second, TCP and UDP diverge sharply.

The takeaway is not "TCP is bad for streaming". The takeaway is "TCP is bad for sub-3-second streaming". For 5-second VOD and live OTT, TCP is the right call — its reliability and HTTP's cacheability outweigh its latency penalty. For 100-ms-class interactive video, TCP cannot meet the budget.

Why WebRTC, SRT, and RIST live on UDP

The three big real-time protocol families all sit on UDP, and they each implement a different flavour of "make UDP less bad".

WebRTC (W3C Recommendation plus RFC 8825–8866 and the RTP family, RFC 3550 onwards) carries media as RTP packets over UDP. Each RTP packet has a sequence number and a timestamp. The receiver uses the sequence number to detect loss and the timestamp to schedule playback. When loss is detected, WebRTC has several recovery mechanisms — NACK (negative acknowledgement of specific lost packets), RTX (retransmission as a separate RTP stream with a different payload type), FEC (forward error correction that adds redundancy ahead of time), and the codec's own error concealment. None of these block the head of the line; if frame 100 is unrecoverable in time, the decoder conceals it and plays frame 101. Glass-to-glass latency in WebRTC is typically 100–500 milliseconds.

SRT (Secure Reliable Transport, defined in the Internet-Draft draft-sharabayko-srt-01, March 2024 — note the draft status; SRT can change before final RFC) wraps UDP with a tunable mix of ARQ retransmission, optional FEC, AES encryption, and timestamp-based delivery. The clever part is the configurable latency buffer: you tell SRT "give me 200 milliseconds of jitter buffer", and it uses that budget to recover lost packets by retransmission if a round-trip will fit, and gives up otherwise. SRT is the de-facto standard for professional public-internet contribution links — the camera-to-encoder-to-cloud hop. Its head-of-line behaviour is configurable: inside the budget, it waits and retransmits; past the budget, it drops and moves on.

RIST (Reliable Internet Stream Transport, SMPTE TR-06-1, TR-06-2, TR-06-3) is the broadcast industry's answer to the same problem. RIST sits on UDP carrying RTP, with its own NACK-based retransmission scheme that operates on RTP sequence numbers. Where SRT was driven by Haivision and Wowza, RIST was driven by the Video Services Forum and a coalition of broadcast vendors. The two solve the same problem with different governance.

RTMP and its descendants are the only major exception in the contribution-side family — RTMP runs on TCP because it was designed in 1996 to run over the same TCP socket as Flash's other traffic. RTMP is dying as a contribution protocol; SRT and WHIP are eating its lunch precisely because TCP is the wrong choice for a contribution link with any meaningful packet loss.

A worked example: the same lost packet on TCP and on UDP

Numbers in. A camera is encoding 1080p at 30 fps with an average frame size of 8 KB and a peak I-frame of 80 KB. The link to the cloud has 80 ms one-way latency, 160 ms round-trip, and an average packet-loss rate of 0.5%. We send one frame as ten 1500-byte UDP datagrams (real numbers — MTU is 1500 minus IP and UDP headers).

On TCP (HLS contribution over RTMP, the legacy case): the encoder pushes the ten packets into the TCP socket. One packet is lost. The receiver's kernel buffers the other nine. The encoder discovers the loss when it does not see an ACK within the retransmission timeout (160 ms baseline plus jitter), and retransmits the missing packet. The receiver waits 160 ms before any of the ten packets become available to the application. At 30 fps, that is 4.8 frames of stall. Then the next frame arrives, and on the way another packet is lost — another 160 ms stall. At 0.5% loss across ten packets per frame, the expected number of lost-packet events is about one every twenty frames, or one per 0.67 seconds. The user sees an obvious freeze and skip every two-thirds of a second.

On UDP (WebRTC contribution, or SRT): the same encoder packetises the same ten datagrams and fires them. One is lost. The receiver gets nine, notices the gap via the RTP sequence number, NACKs the missing one. The NACK and retransmission take 160 ms — the same as TCP — but in the meantime the decoder has the nine packets it already received, conceals the missing region from the previous frame, and plays the frame at the correct presentation time. If the retransmission arrives in time it gets stitched in; if not, the loss is concealed and the next frame plays normally. There is no visible freeze. The next I-frame fully resyncs the picture within 2 seconds.

This is the entire engineering case for UDP in interactive streaming. Same network, same loss rate, same retransmission round-trip — completely different user experience.

QUIC and Media over QUIC: the new map

The TCP-versus-UDP dichotomy is starting to dissolve in 2026 because of QUIC.

QUIC, defined in IETF RFC 9000 (May 2021), is a transport protocol that runs on top of UDP but reimplements most of what TCP does: reliability, ordering, congestion control, flow control, and encryption (TLS 1.3 is baked into the handshake, not bolted on top). The kernel sees UDP datagrams; the application sees an ordered, reliable, encrypted byte stream. From the application's point of view, QUIC looks a lot like TCP+TLS.

The crucial difference is that QUIC supports multiple independent streams over the same connection, and a lost packet on one stream does not block any of the others. In TCP, every byte is in one giant ordered stream; head-of-line blocking applies across the whole connection. In QUIC, you can have stream 0 (video chunk 1), stream 1 (video chunk 2), stream 2 (audio chunk 1), stream 3 (subtitles) all in flight at once, and a loss on stream 0 does not stall streams 1, 2, or 3. The head-of-line block becomes per-stream instead of per-connection. For chunked-encoded HTTP/3 delivery of CMAF segments, that is exactly the cure for the LL-HLS-on-TCP latency floor.

HTTP/3 (RFC 9114) is HTTP over QUIC. HLS and DASH delivered over HTTP/3 inherit QUIC's per-stream isolation without changing the protocols above. Media over QUIC, draft-ietf-moq-transport-17 (January 2026; this is an Internet-Draft and subject to change before publication as an RFC), goes further — it treats the stream as a tree of named objects rather than a list of segments, with first-class support for relay, multi-CDN, and live distribution at sub-second latency. MoQ is the most credible candidate for the "next default" delivery protocol after LL-HLS. We cover it in Media over QUIC in depth.

The practical upshot for 2026: TCP versus UDP is still the right mental model for understanding any existing protocol, but when you build a new one you should reach for QUIC. Cloudflare, Meta, Google, and YouTube already deliver substantial fractions of their traffic over HTTP/3 in 2026, and QUIC's share of total internet traffic crossed 30% in 2025 per Sandvine's Global Internet Phenomena report.

Common mistake — assuming TCP is "more reliable" in any useful sense

A common product-team confusion: "We need our streaming to be reliable, so we should use TCP." This conflates two different meanings of reliability.

TCP's reliability is byte-level: every byte you send arrives, eventually, in order. That is a property of the byte stream, not of the user experience. From a viewer's point of view, a frozen video is not "reliable" — it is broken. UDP's per-packet unreliability, combined with a real-time application's ability to discard, conceal, and move on, produces a much more useful reliability at the user-experience layer: the picture keeps moving.

The right question is not "TCP or UDP, which is more reliable?" but "what does my application define as 'arrived in time'?" For a 24-second OTT buffer, TCP's eventual delivery is in time. For a 100-ms conferencing budget, anything that does not arrive within 100 ms is gone — and TCP's retransmission cannot meet that budget, so its "reliability" is moot.

Treat reliability as a budget, not a property. UDP plus your own loss-recovery is more reliable inside a tight latency budget than TCP. TCP is more reliable outside any budget. Pick the one that matches your budget.

Where Fora Soft fits in

We have shipped video streaming, video conferencing, OTT, surveillance, telemedicine, e-learning, and AR/VR products since 2005, and the TCP/UDP decision is the first whiteboard sketch on every project. Telemedicine consults need 200 ms WebRTC over UDP; e-learning lectures recorded for later viewing happily ride HLS over TCP; OTT live linear needs LL-HLS partial segments on TCP with HTTP/3 where the player supports it; surveillance archive ingest from cameras runs SRT over UDP because cellular uplinks lose packets. The same pipeline often carries traffic on both transports at different stages — UDP-based WebRTC ingest from the camera, transcoded and re-packaged into TCP-based HLS for the long-tail viewer audience. Knowing which transport sits underneath each hop is the first sanity check we run when an existing system feels slow.

What to read next

Talk to us / See our work / Download

  • Talk to a streaming engineer — book a 30-minute scoping call to map your TCP / UDP / QUIC decisions against your latency budget.
  • See our case studieswww.forasoft.com/case-studies for telemedicine, e-learning, OTT, and surveillance builds.
  • Download the TCP vs UDP for Streaming Cheat Sheet (PDF) — one page, the six-dimension comparison, the head-of-line blocking diagram, and the protocol-to-transport map.

References

  1. IETF RFC 9293, Transmission Control Protocol (TCP), August 2022. https://datatracker.ietf.org/doc/html/rfc9293 — the modernised TCP specification that formally obsoletes RFC 793 (1981). Section 3.3 covers connection establishment; §3.7 covers retransmission. The current authoritative source for TCP behaviour.
  2. IETF RFC 793, Transmission Control Protocol, September 1981. https://datatracker.ietf.org/doc/html/rfc793 — the original TCP specification, now obsoleted by RFC 9293 but still cited everywhere; the three-way handshake diagram in §3.4 is the canonical one.
  3. IETF RFC 768, User Datagram Protocol, August 1980. https://datatracker.ietf.org/doc/html/rfc768 — the entire UDP specification in three pages. The shortness is the point.
  4. IETF RFC 8085, UDP Usage Guidelines, March 2017. https://datatracker.ietf.org/doc/html/rfc8085 — what an application has to do on top of UDP to be a good network citizen. The de-facto reading list for any team building a UDP-based protocol.
  5. IETF RFC 9000, QUIC: A UDP-Based Multiplexed and Secure Transport, May 2021. https://datatracker.ietf.org/doc/html/rfc9000 — the QUIC specification. §2 covers stream multiplexing, which is where the head-of-line blocking improvement lives.
  6. IETF RFC 9114, HTTP/3, June 2022. https://datatracker.ietf.org/doc/html/rfc9114 — HTTP over QUIC.
  7. IETF RFC 3550, RTP: A Transport Protocol for Real-Time Applications, July 2003. https://datatracker.ietf.org/doc/html/rfc3550 — RTP, the application-layer protocol carried over UDP that does sequencing, timestamping, and the framing WebRTC and SRT both build on.
  8. IETF RFC 8825, Overview: Real-Time Protocols for Browser-Based Applications, January 2021. https://datatracker.ietf.org/doc/html/rfc8825 — the WebRTC architectural overview that names every other RFC in the WebRTC family. Confirms WebRTC's UDP-first design.
  9. IETF draft-sharabayko-srt-01, The SRT Protocol, March 2024. https://datatracker.ietf.org/doc/html/draft-sharabayko-srt-01 — current Internet-Draft for SRT. Note: Internet-Drafts can change before RFC publication; cite the draft number AND the date.
  10. SMPTE TR-06-1:2020, Reliable Internet Stream Transport (RIST) — Simple Profile. https://ieeexplore.ieee.org/document/8956048 — the RIST baseline transport over UDP/RTP.
  11. IETF RFC 8216, HTTP Live Streaming, August 2017. https://datatracker.ietf.org/doc/html/rfc8216 — confirms HLS's design as an HTTP-over-TCP protocol delivering segmented files.
  12. Apple HLS Authoring Specification, revision 2025-09, Apple Inc. https://developer.apple.com/documentation/http-live-streaming/hls-authoring-specification-for-apple-devices — describes the LL-HLS extensions in §2.10 onwards; living document, check the revision date.
  13. ISO/IEC 23009-1:2022, Dynamic adaptive streaming over HTTP (DASH) — Media presentation description and segment formats, fifth edition. https://www.iso.org/standard/83314.html — DASH protocol; the segment fetch over HTTP is implied in §5.
  14. IETF RFC 9725, WHIP — WebRTC-HTTP Ingestion Protocol, March 2025. https://datatracker.ietf.org/doc/html/rfc9725 — WHIP's signalling rides HTTP/TCP but the media plane is WebRTC/UDP; a clean illustration of how the two transports cooperate in modern stacks.
  15. draft-ietf-moq-transport-17, Media over QUIC Transport, January 2026. https://datatracker.ietf.org/doc/html/draft-ietf-moq-transport-17 — Internet-Draft, subject to revision before final RFC. Cited for the per-stream multiplexing model that succeeds CMAF segments.
  16. Mathis, M., Semke, J., Mahdavi, J., Ott, T., The macroscopic behavior of the TCP congestion avoidance algorithm, ACM SIGCOMM CCR, July 1997. https://dl.acm.org/doi/10.1145/263932.264023 — the original throughput ≤ MSS / (RTT × √loss) derivation. Used in the worked example to justify why TCP collapses on lossy networks.
  17. Sandvine, Global Internet Phenomena Report, 2025 edition. https://www.sandvine.com/global-internet-phenomena-report — QUIC share of total internet traffic in 2025.