Why This Matters

If you are streaming live video from a venue, a stadium, a remote crew, a satellite truck, or a mobile encoder, the single biggest operational risk is the contribution path: the leg between the encoder and the first server you control. That leg is where the public internet's jitter, packet loss, and congestion eat your stream alive, and it is the one place where RTMP — the default for two decades — actually breaks under realistic field conditions. SRT exists because Haivision needed a contribution protocol that survives a 5 percent burst loss on a venue Wi-Fi link without dropping out, and the rest of the industry adopted it because the same answer works for satellite trucks, cellular bonded uplinks, and cloud-to-cloud feeds.

This article is the canonical Block 3 reference on SRT: what the protocol actually is, what problem it solves that RTMP cannot, where the 4× round-trip-time latency rule comes from, how the caller / listener / rendezvous connection modes work, what AES-256 encryption costs you, where SRTLA — the multi-link bonded variant — fits in, and how SRT compares to RIST and WHIP in the 2026 contribution decision. By the end you will be able to defend a "we ship SRT" decision in a planning meeting and configure an encoder for it without guessing.

What SRT Is — In One Page

Secure Reliable Transport, abbreviated SRT, is an application-layer protocol that rides on UDP and gives the application three properties the underlying UDP does not: reliable delivery, encryption, and a tight upper bound on end-to-end latency. Haivision built the original implementation between 2013 and 2017 as the contribution-side transport for its KB Encoder/Decoder product line and released both the protocol and the reference implementation as open source at the 2017 NAB Show. Maxim Sharabayko and a small group of co-authors submitted the specification to the IETF as draft-sharabayko-srt (versions 00 and 01, between 2020 and 2022). Those drafts have since expired, so the controlling reference today is the combination of the latest expired IETF draft, the Haivision reference implementation on GitHub (currently at v1.5.4), and the SRT Alliance Deployment Guide.

Mechanically, SRT runs on UDP and chooses any port the operator configures — port 9000 is a very common convention, but unlike RTMP's port 1935, no single port is canonical. A session begins with a handshake between a caller and a listener (or two callers in rendezvous mode), an optional key exchange if encryption is enabled, and a stream-establishment phase where the two sides agree on latency budget, bandwidth, and operating mode. The data plane then carries video, audio, and metadata as a sequence of numbered UDP datagrams. When the receiver notices a missing sequence number, it issues a negative acknowledgement, abbreviated NAK, asking the sender to retransmit just that packet. If the retransmission arrives inside the negotiated latency window, the receiver slots it into place and hands the reassembled stream to the application; if the retransmission misses the window, the receiver records a one-frame hole and moves on.

The protocol carries three modes — live, file-buffer, and file-message — that differ in how aggressive the latency budget is and what guarantees apply at the boundary between packets. Live mode is the one almost everyone uses for video: a single-packet payload, a tight latency budget, and the explicit contract that late packets get dropped rather than block the stream. File modes are for moving a finite blob of bytes across an unreliable network where end-to-end correctness matters more than timeliness. Unless you are explicitly building a file transfer on top of SRT, you can treat "SRT" and "SRT live mode" as synonyms.

The wire format is binary, the timing is microsecond-resolution, and the protocol carries application timestamps end-to-end so the receiver can reconstruct the sender's original cadence. SRT can optionally encrypt the payload with AES-128 or AES-256 in CTR mode, using a shared passphrase or a pre-shared key. That is the architecture in 200 words.

A timeline diagram showing the SRT session, end to end, between a sender and a receiver. The top of the canvas labels the sender on the left and the receiver on the right, with a vertical UDP-on-port-9000 lifeline connecting them. Phase 1 — handshake — shows the four-step caller-listener exchange: induction request, induction response, conclusion request, and conclusion response, each labelled with its handshake-extension contents. Phase 2 — key exchange — shows the Key Material Exchange and the wrapped Stream Encrypting Key crossing once. Phase 3 — steady-state data plane — shows a continuous flow of numbered UDP datagrams, with a gap where datagram 47 is missing, a NAK message from receiver to sender asking for 47, and the retransmitted 47 arriving inside the latency budget. A note at the bottom states that everything rides on UDP and that packets older than the latency budget are dropped rather than retransmitted. Figure 1. The SRT session, end to end. Four-step handshake on UDP, optional key exchange if encryption is on, then a continuous data plane where missing packets trigger a NAK and the sender retransmits only the missing datagram. Packets older than the latency budget are dropped, not retransmitted.

The Short Version of How SRT Got Here

The history matters because it explains what SRT optimises for. Haivision is a Canadian company that has built broadcast-grade contribution hardware since the early 2000s — the KB and Makito encoder/decoder families that satellite trucks, sports broadcasters, and news organisations bolt into the back of their racks. Around 2012 the company kept hitting a wall: every customer wanted to push live video over the public internet (cheaper than a dedicated fibre line, faster to deploy than a satellite uplink), but the available protocols all failed under realistic field conditions. RTMP stalled on 2 percent packet loss. UDP without retransmission produced visible artefacts. RTP over a private VPN worked, but only inside a controlled network. Haivision's engineers — including Maxim Sharabayko, the protocol's primary author — built a new transport that combined UDP's "skip the late stuff" behaviour with selective retransmission and an explicit latency budget, and shipped it as the default transport in the KB product line.

By 2017 Haivision had enough proof points (the NFL using SRT for venue feeds, news organisations using it for remote contribution) to make a strategic bet: open-source the protocol, hand it to the industry, and turn what was a hardware-only feature into a standard everyone could implement. At NAB 2017 Haivision released the protocol specification and a C-language reference implementation under the Mozilla Public License 2.0, formed the SRT Alliance with Wowza as the first co-founding member, and invited the rest of the industry to adopt it. Microsoft joined in 2018, AWS Elemental in 2019, NVIDIA in 2023, and Cloudflare, Paramount, Dolby, Mux, JW Player, THEO Technologies, EVS, and Chyron between 2023 and 2025. The SRT Alliance crossed the 600-member mark in 2024 and the 700-member mark in 2025 — making it the largest open-protocol alliance in the streaming industry.

The 2024 Haivision Broadcast Transformation Report — the industry's annual survey of contribution-protocol usage in production broadcast environments — found that 68 percent of broadcasters now use SRT for live video transport. That is the most widely adopted modern contribution protocol in the broadcast industry, ahead of RTMP, RIST, and Zixi. For the developer-platform tier (AWS Elemental MediaConnect, Cloudflare Stream, Dolby Millicast, Mux Live) the figure is closer to 100 percent: SRT ingest is universally supported, often alongside RTMPS and WHIP, on a single product surface.

A useful way to hold this in your head: SRT did not start as a standards committee project. It started as a working hardware product, was good enough to displace RTMP for professional contribution, and was then handed to the industry as open source. That order — ship first, standardise later — is why SRT works in the field and why the protocol's documentation lives partly in an expired IETF draft and partly in the reference implementation on GitHub.

The One Problem SRT Solves That RTMP Cannot

The single sentence that captures SRT's value, the one to write on a sticky note and stick to your monitor, is this: SRT lets you push live video across a lossy public-internet path without the stream collapsing on packet loss. Everything else about the protocol — the modes, the handshake, the encryption — exists to support that one property.

To understand why this is hard, you have to understand what packet loss does to RTMP. RTMP rides on TCP, which gives the application three guarantees: every byte arrives, every byte arrives in order, and the sender slows down if the network is congested. Those guarantees are perfect for web pages and file downloads. They are catastrophic for live video, because in-order delivery means that if packet 47 is lost on the wire, packets 48, 49, 50, and every packet after that sit in the receiver's kernel buffer waiting for 47 to be retransmitted. That waiting time is at least one round-trip time, typically 30 to 200 milliseconds depending on the path. During that wait, the receiver delivers nothing to the application. This is called head-of-line blocking, and it is the dominant failure mode of any TCP-based contribution protocol on a lossy path. The visible effect for the viewer is a stall: the playback freezes for half a second to several seconds, then resumes.

The other half of TCP's behaviour is congestion control. When TCP detects packet loss, it interprets that as congestion and cuts the send rate — typically in half. For a 6 Mbps live stream, a single loss event halves the effective send rate to 3 Mbps for several seconds while TCP cautiously probes back up. The encoder is producing 6 Mbps and the connection now carries 3 Mbps; the encoder's buffer fills, frames drop, and viewers see degraded picture quality. The protocol's response to "I lost a packet" is "I will send less data" — which is the wrong answer when your job is to keep a live stream flowing.

SRT addresses both behaviours by giving up TCP entirely and rebuilding the reliability layer on top of UDP, with selective retransmission and an explicit latency budget. When SRT loses packet 47, the receiver sends a single NAK asking the sender to retransmit just packet 47. The sender retransmits packet 47. The receiver continues handing packets 48, 49, 50 to the application as they arrive — there is no in-order constraint until the application explicitly requires one. If packet 47's retransmission arrives inside the configured latency window, the receiver slots it back into the stream at the right position. If it misses the window, the receiver records a single missing sample, lets the decoder do error concealment (or drops a single frame), and keeps the rest of the stream flowing.

The arithmetic, out loud, with realistic numbers. Suppose your contribution path has 90 ms round-trip time and 2 percent average packet loss with occasional 5 percent bursts. RTMP on this path: a single 2 percent burst means roughly 1 in 50 packets is lost; each loss triggers a 180 ms retransmission round-trip plus a halving of the send rate; the encoder sends 6 Mbps, the connection delivers 3 Mbps, the encoder buffers, frames drop, and viewers see a stall every 30 seconds. SRT on the same path with a 360 ms latency budget (4× the 90 ms RTT, which is the recommended setting): the receiver issues a NAK for each lost packet, the sender retransmits within one RTT, and the receiver slots the retransmitted packet back into the stream. The send rate stays at 6 Mbps. The viewer sees no stall. The same loss event, two completely different outcomes.

A side-by-side comparison diagram of RTMP and SRT handling a packet loss event on a 90-millisecond-RTT path with 2 percent packet loss. The top panel shows RTMP: encoder sends a continuous 6 Mbps stream, chunk 47 is lost, the receiver buffer freezes for 180 milliseconds waiting for retransmission, TCP halves the send rate from 6 Mbps to 3 Mbps, the encoder buffer fills, and viewers see a multi-second stall. The bottom panel shows SRT: encoder sends a continuous 6 Mbps stream, packet 47 is lost, the receiver issues a NAK requesting just packet 47, the sender retransmits 47 within one RTT, the retransmitted packet arrives inside the 360-millisecond latency budget, the send rate stays at 6 Mbps, and viewers see no glitch. A summary line at the bottom states: same loss event, two different outcomes; SRT is the protocol that finishes the stream. Figure 2. The single most important diagram in this article. RTMP's TCP-based design produces head-of-line blocking and congestion-control rate halving on loss; SRT's selective retransmission inside a latency window recovers only the missing packet and keeps the send rate steady.

Where The 4× RTT Latency Rule Comes From

If you remember nothing else about configuring SRT, remember this: the latency setting must be at least four times the round-trip time of the contribution path. The rule is folklore-grade common — every Haivision configuration guide, every SRT Alliance deployment document, every vendor tutorial repeats it — and the math behind it is worth understanding because it tells you how to size the budget for non-standard paths.

A NAK-based recovery has three timing components. First, the receiver has to notice that a packet is missing — typically when it sees a gap in the sequence numbers (packet 46 arrives, then 48, with 47 absent). The receiver waits a small amount of time before issuing the NAK, in case 47 is just delayed; the default is one packet pair time. Second, the NAK has to travel from receiver to sender, which takes half a round-trip. Third, the sender has to retransmit packet 47, which takes another half round-trip in the same direction as the original data. That is one full round-trip per recovery, plus a small inspection delay.

If a single retransmission fails — the retransmitted packet is also lost — the receiver issues another NAK, and another round-trip elapses. Two retransmissions in a row take two round-trips. Three retransmissions take three. The 4× RTT rule is what lets the receiver tolerate roughly three retransmission rounds before the late packet is dropped. On a path with 1 percent average loss and 5 percent burst loss, two retransmissions in a row are uncommon but not rare; three retransmissions in a row are rare. The rule is calibrated to handle realistic packet loss patterns without dropping packets that could have been recovered with a slightly longer wait.

The arithmetic, out loud. Suppose the path has 30 ms RTT (a typical wired same-country connection). The latency setting should be at least 4 × 30 = 120 ms, with 200 ms a safe default that gives margin for jitter. Suppose the path has 200 ms RTT (a transatlantic fibre link). The setting should be at least 4 × 200 = 800 ms, with 1,000 ms a safe default. Suppose the path is a 4G mobile uplink with 150 ms RTT and high jitter. The setting should be at least 4 × 150 = 600 ms, but mobile paths benefit from a larger margin — 1,500 to 4,000 ms is a typical operational range for cellular contribution because the round-trip time itself fluctuates by hundreds of milliseconds. Suppose the path is a geostationary satellite link with 600 ms RTT. The setting must be at least 2,400 ms; production satellite contribution often runs at 4,000 to 8,000 ms.

The setting you choose is a direct trade between glass-to-glass latency and resilience. Lower latency means SRT has less time to recover lost packets, so more packets get dropped, so the viewer sees more glitches. Higher latency means more packets can be recovered, but the viewer sees the live event later. The 4× RTT rule is the floor; the operational setting is the rule plus margin for the worst-case jitter you have observed on the path.

Path scenarioRTT4× RTT (floor)Recommended setting
Wired LAN, same city5–15 ms60 ms120 ms
Wired internet, same country20–40 ms160 ms200–500 ms
Wired internet, transatlantic80–120 ms480 ms800–1,200 ms
4G mobile uplink80–200 ms800 ms1,500–2,500 ms
5G mobile uplink30–80 ms320 ms800–1,500 ms
Geostationary satellite500–700 ms2,800 ms4,000–8,000 ms
Low-earth-orbit satellite (Starlink)25–60 ms240 ms500–1,000 ms
The takeaway: SRT's latency floor is configurable, not fixed. A wired contribution path inside a city can hit sub-200 ms glass-to-glass with SRT; a satellite path will be many seconds. The right setting is the smallest value that holds up under your worst observed RTT plus jitter — which you find by measurement, not by guessing.

Caller, Listener, Rendezvous — Connection Modes

SRT defines three connection modes. Most operators only ever configure two of them; the third matters for one specific NAT-traversal case. Understanding which mode the encoder uses and which mode the server uses is the single most common stumbling block when bringing up a new SRT contribution link.

Caller mode. The encoder initiates the connection by sending UDP packets to the server's address. This is the analogue of "I am a client connecting to a server". Use caller mode on the encoder side when the server has a fixed public address and the encoder is behind a NAT or in a position where it cannot accept inbound connections.

Listener mode. The encoder (or the server) opens a UDP socket and waits for incoming connection requests. This is the analogue of "I am a server accepting clients". Use listener mode on the server side in a standard contribution architecture: the ingest server listens on srt://0.0.0.0:9000, the encoder dials in as a caller. Less commonly, an encoder running in a cloud VM with a public IP can be the listener and the receiver can dial in as the caller — useful when the receiver is roaming and the sender is fixed.

Rendezvous mode. Both endpoints simultaneously try to establish the connection by sending UDP packets to each other's public address. This is the analogue of "we both initiate, the protocol picks one of us as the master". Rendezvous mode exists to traverse symmetric NATs, where neither side can listen on a public port but both can punch through their respective NATs by sending outbound packets to a well-known address pair. In practice, rendezvous is used in peer-to-peer-style contribution between two encoders or between two field locations; almost no commercial ingest service offers a rendezvous endpoint.

The matching rule is mechanical: a caller must connect to a listener, or two callers must use rendezvous. Caller-to-caller without rendezvous is invalid; listener-to-listener is invalid. Most production deployments are caller (encoder) to listener (server) with the server on a public address — that is the configuration every major cloud platform expects.

A subtlety worth flagging: SRT's connection mode is independent of which side is sending the media. The protocol carries a direction parameter (the m=push or m=request modifier in the SRT URI, with m=publish and m=read as the common values) that decides whether the caller is uploading or downloading. So a caller can either push video to a listener or pull video from a listener; the connection topology is decoupled from the media direction. Most contribution paths are caller-publish (encoder pushes to listener-server), but caller-read (receiver pulls from a listener-encoder) is a legitimate and useful pattern for remote production.

Encryption — AES-128 vs AES-256, And Why You Should Use It

SRT can optionally encrypt the payload with AES-128 or AES-256 in CTR mode. Encryption is not on by default — the protocol works without it — but every production deployment on the public internet should enable it. The cost is negligible (a few percent of CPU on a modern encoder; bandwidth overhead is one byte per packet for the encryption header) and the benefit is that the stream cannot be intercepted, redirected, or replayed by an attacker who can sniff UDP packets on the contribution path.

The mechanism is straightforward. Both sides agree on a shared passphrase (a string between 10 and 79 characters, configured by the operator on each side). During the conclusion phase of the handshake, the listener generates a random salt, sends it to the caller, and both sides derive a Key Encrypting Key (KEK) from the passphrase and salt using PBKDF2. The listener then generates a random Stream Encrypting Key (SEK), wraps it with the KEK using the AES key-wrap algorithm, and sends the wrapped SEK to the caller. The caller unwraps the SEK with its KEK. Both sides now share a fresh SEK that nobody on the wire can derive from the handshake alone — the passphrase never crosses the wire. The data plane then encrypts each payload with the SEK in AES-CTR mode, using the packet sequence number plus a random IV component as the counter input.

AES-128 versus AES-256 is a choice between two strengths of the same algorithm. AES-128 uses a 128-bit key and produces 10 rounds of substitution; AES-256 uses a 256-bit key and produces 14 rounds. The computational overhead of AES-256 is roughly 40 percent higher than AES-128 — a number that is invisible on modern CPUs with AES-NI hardware acceleration (effectively zero CPU cost) and meaningful only on small embedded encoders without hardware acceleration. The cryptographic strength difference is academic for live video: AES-128 is currently estimated to take roughly 2^128 operations to brute-force, which is well beyond any feasible attack. Use AES-128 if you care about CPU on a small encoder; use AES-256 if your compliance regime mandates it (some broadcast-industry standards specify AES-256 for content protection). The protocol negotiates the key size during the handshake; both sides must agree, and if they disagree the connection fails.

The single most common encryption-setup mistake is using a weak passphrase — "test1234" or "password" — on a production link. SRT's PBKDF2 derivation slows brute-force attacks but a 10-character passphrase from the standard keyboard alphabet has roughly 60 bits of entropy, which is computationally tractable to crack offline if an attacker captures the handshake. Use a passphrase generator and treat the passphrase like a TLS private key: 32+ random characters, rotated periodically, stored in a secrets manager. The protocol's encryption is only as strong as the passphrase you give it.

The Three Modes — Live, File-Buffer, File-Message

SRT carries three transmission modes that differ in how strict the latency contract is and what guarantees the receiver gives the application at packet boundaries. For 99 percent of video contribution use cases, you want live mode. The other two exist for completeness and for the file-transfer use cases SRT was extended to support.

Live mode is what video uses. The sender writes one application-level packet (typically a chunk of MPEG-TS or a Tag of FLV) into one UDP datagram. The receiver reads one datagram, gets one application packet. The latency budget is enforced strictly: packets older than the budget are dropped. The protocol is allowed to deliver packets out of order if a later packet arrived before a retransmission of an earlier one, but in practice MPEG-TS-over-SRT carries timestamps in the elementary stream so the application reorders inside the decoder. Live mode is the only mode that matters for streaming video, and unless you have a specific reason to choose otherwise, your encoder and server should both be in live mode.

File-buffer mode is for moving bulk data where you want byte-stream semantics like TCP but with SRT's selective retransmission. The sender writes a stream of bytes; the receiver reads a stream of bytes. SRT does not preserve any application-level packet boundaries — it behaves like TCP without the in-order constraint, with the constraint that the receiver may see chunks at arbitrary boundaries. This mode exists for moving large media files (e.g., a finished 4K master from a remote production location to a central archive) where latency does not matter but reliability does.

File-message mode preserves application-level message boundaries while still using the selective-retransmission machinery. The sender writes one message of up to 64 KB; the receiver reads exactly that same message. This mode is similar in spirit to SCTP and is useful for moving discrete units of data (e.g., subtitle blocks, metadata files) across the link. It is rarely used for video.

In practice, every modern encoder defaults to live mode for SRT, every modern ingest server defaults to live mode for SRT, and the file modes appear only in specialised file-transfer products. If you are configuring SRT and the mode question comes up, the answer is live.

SRTLA — Bonding Multiple Connections For Mobile And IRL Streaming

The single weak point of SRT, like every other contribution protocol, is that it can only use as much bandwidth as the underlying link provides. A single 4G cellular uplink might give you 8 Mbps when conditions are good and 1 Mbps when the cell is congested or the signal is weak. For a moving encoder — a reporter in a vehicle, a sports videographer on a sideline, an IRL streamer on a city street — no single cellular connection is reliable enough for broadcast-quality video.

SRTLA — SRT Link Aggregation — is the answer. It is a UDP-level proxy that sits between the SRT sender and the network, takes the stream of SRT packets, and distributes them across two or more network interfaces (typically two to six cellular modems on different carriers, plus an optional Wi-Fi connection for backup). On the receiving side, a corresponding SRTLA receiver reassembles the packets back into a single SRT stream that the SRT server consumes as if it had come from a single link. The aggregate bandwidth is the sum of all the individual links, the failure mode of any single carrier is absorbed by the others, and the bonded link is more resilient than any single cellular connection.

SRTLA was developed by the BELABOX project — an open-source IRL streaming initiative that grew out of the live-streaming-from-the-street culture (Twitch, Kick, YouTube Live) — and is now the de facto standard for mobile cellular contribution. The BELABOX hardware (a small Linux box with USB ports for cellular modems) is the most widely used SRTLA sender; corresponding SRTLA receiver software runs on cloud servers or on the contribution-side hardware. SRTLA is now also implemented in several commercial encoder products: Haivision's Pro 460 mobile encoder, LiveU's LU800 (with proprietary bonding alongside SRTLA), and several smaller vendors aimed at the news and sports markets.

The protocol details matter for capacity planning. SRTLA does not split the stream by frame — it splits at the SRT packet level, which means a single video frame may have its constituent packets distributed across two or three modems. On the receiver side, SRTLA reassembles the packets by sequence number before handing the stream to the SRT server. The latency added by SRTLA is small (typically 20 to 100 ms beyond the underlying SRT latency budget) but the protocol does require that the SRT latency setting be raised to accommodate the worst-case round-trip variance across all bonded links — because a packet sent on a fast 5G modem may arrive before a packet sent on a slow 4G modem from earlier in the sequence, and the SRT receiver needs enough budget to wait for the slow one. A typical SRTLA configuration uses a 4,000 to 8,000 ms SRT latency budget for a 4G/5G bonded uplink — much higher than a single wired link, but the trade is justified by the bandwidth and resilience gains.

For a 2026 IRL or remote-production deployment, the standard stack is BELABOX hardware on the sender side, SRTLA bonding across three to six cellular modems, an SRT receiver in the cloud (often Cloudflare Stream, AWS Elemental MediaConnect, or a self-hosted Linux VM running the SRT reference implementation), and an SRT-to-HLS or SRT-to-RTMP packager downstream for distribution to viewers. We have shipped this architecture for live news and field-production deployments and it works reliably across cellular, fixed wireless, and mixed-connectivity field locations.

What Platforms Actually Support SRT In 2026

A useful exercise: list the platforms that accept SRT ingest in 2026 and note where SRT replaces RTMPS and where it lives alongside it.

PlatformRTMPSSRTNotes
YouTube LiveYesNoSocial platforms remain RTMPS-only for ingest.
TwitchYesNo (closed beta)Limited SRT experiment in 2023 never reached general availability.
Facebook LiveYesNoRTMPS-only.
KickYesNoRTMPS-only.
AWS Elemental MediaConnectNoYesSRT-native; MediaLive accepts RTMPS and SRT both.
Cloudflare StreamYesYesBoth protocols on one ingest endpoint.
Dolby MillicastYesYesWebRTC-first product; RTMPS and SRT are compatibility bridges.
Mux LiveYesYesThree protocols on one endpoint (RTMPS, SRT, WHIP).
Wowza Streaming EngineYesYesSelf-hosted; both ingest protocols built in.
Nimble StreamerYesYesSelf-hosted; both ingest protocols built in.
Haivision HubNoYesSRT-native cloud service from the protocol's authors.
Vimeo LivestreamYesYesBoth ingest protocols accepted on the same product surface.
The pattern is consistent: the consumer social platforms (YouTube, Twitch, Facebook, TikTok, Kick) are still RTMPS-only for ingest in 2026, while the developer-platform and broadcast tier (AWS MediaConnect, Cloudflare Stream, Dolby Millicast, Mux Live, Wowza, Nimble, Haivision Hub, Vimeo) all accept SRT, usually alongside RTMPS and WHIP on a single endpoint. The split is by audience: if you are publishing to consumer social, you ship RTMPS; if you are doing professional contribution, remote production, or building a B2B live video product, SRT is the right tool.

Common Pitfalls — The Mistakes That Break SRT In Production

A short list of the failures we have seen most often when bringing up a new SRT contribution link. Most of them are not bugs in the protocol; they are configuration mismatches that the protocol's error messages do not surface clearly.

Pitfall 1: latency mismatch between sender and receiver. SRT's latency setting is negotiated during the handshake — both sides propose a value and the maximum of the two becomes the effective budget. If the sender wants 200 ms and the receiver wants 5,000 ms, the budget is 5,000 ms — which is the receiver's value. Most production failures show up when the receiver was configured with a default 120 ms budget and the sender's path actually needs 800 ms; the receiver discards "late" packets that the sender thought it had budget to recover, and the stream looks broken even though the recovery would have worked with a longer budget. The fix is to configure both sides to the same budget, sized for the worst-case path RTT.

Pitfall 2: rendezvous mode used where caller-listener would work. Operators sometimes default to rendezvous because "both sides connect" sounds simpler, but rendezvous is meaningfully harder to debug (both sides have to be reachable at the moment of connection; firewalls have to allow outbound to the peer's IP and port; many cloud providers do not allow incoming UDP at all). Use caller-listener whenever one side has a stable public address; reserve rendezvous for the cases where neither side does.

Pitfall 3: weak or default passphrase. Treat the passphrase like a TLS private key. 32+ random characters, rotated periodically, stored in a secrets manager.

Pitfall 4: UDP firewall not opened. SRT runs on UDP, and many corporate firewalls treat outbound UDP differently from outbound TCP. A common failure pattern is that the encoder cannot establish the handshake at all because the firewall blocks the chosen port. The fix is to ensure outbound UDP on the chosen port (and inbound on the listener side) is permitted; most production deployments standardise on UDP 9000 or 9999 because operators recognise those ports.

Pitfall 5: MTU not respected. SRT defaults to a 1,316-byte payload, which fits inside the standard 1,500-byte Ethernet MTU with margin for IP and UDP headers. On a path with a smaller MTU (most commonly a VPN tunnel or an IPv6-in-IPv4 path), large SRT packets are fragmented at the IP layer, and IPv4 path-MTU discovery sometimes fails silently, producing a stream that establishes the handshake but cannot send media. The fix is to configure SRT's mss parameter to the path's actual MTU; the protocol then sizes packets to fit.

Pitfall 6: no monitoring of retransmission rate. SRT publishes a rich set of statistics — packets sent, retransmitted, lost, dropped, NAKs sent, NAKs received — but most operators never read them. A healthy SRT link should have a retransmission rate below 0.5 percent of total packets. A retransmission rate above 5 percent means the path is significantly lossy and the link will drop frames during bursts. Monitor the stats; alert on retransmission rate; investigate before the viewers complain.

A Worked Example — SRT On The Wire, Numbers Out Loud

Let us trace one real configuration end to end with numbers. We are pushing a 1080p60 HEVC stream at 8 Mbps from a venue encoder to a cloud ingest service that fronts an HLS packager. The contribution path crosses 90 ms of public internet from a stadium in Madrid to AWS eu-west-1 in Dublin. The encoder is OBS Studio 32 running on a Mac mini with a wired Ethernet uplink that has measured 2 percent average packet loss with occasional 5 percent bursts during evening peak.

The SRT URI on the encoder side: srt://ingest.example.com:9000?mode=caller&latency=400&passphrase=<32-char-secret>&pbkeylen=16&streamid=#!::r=eu-west-1,m=publish,u=stadium-cam-1. The receiver side is an AWS Elemental MediaConnect flow configured as an SRT listener on port 9000 with the matching passphrase and a 400 ms latency budget. The handshake completes in roughly 100 ms (two round-trips of UDP across the path). The encoder establishes a single UDP connection and begins streaming HEVC packets carried inside MPEG-TS, each TS chunk written to one SRT live-mode packet, each SRT packet placed in one UDP datagram of 1,316 bytes payload.

During steady state the encoder produces 8 Mbps of video, the SRT layer adds approximately 4 percent overhead for sequence numbers, headers, and the AES-128 encryption tag — total wire bandwidth roughly 8.32 Mbps. The receiver issues NAKs at a measured rate of 1.8 percent of total packets during the evening peak; the sender retransmits, the retransmissions land inside the 400 ms budget, the stream delivers cleanly to MediaConnect. The MediaConnect flow forwards the stream to an Elemental MediaPackage origin that produces 4-second HLS segments. The viewer sees the stream at roughly 8 seconds of glass-to-glass latency — 1 second of encoder + GOP, 0.4 seconds of SRT budget, 0.1 seconds of MediaConnect transit, 4 seconds of HLS packaging, and 2.5 seconds of player buffer at the viewer side.

Compare the same path running over RTMPS. The 2 percent loss event would trigger TCP retransmissions every 50 packets and halve the send rate from 8 Mbps to 4 Mbps for several seconds at a time during the evening peak. The encoder's local buffer would fill, frames would drop, and viewers would see stalls roughly every 30 seconds. The protocol is the same shape — push contribution from venue to cloud — but the failure mode is different. SRT keeps the stream up; RTMPS drops frames.

Where Fora Soft Fits In

We have shipped SRT contribution in production for OTT and remote-production deployments, sports and event live streaming, telemedicine field cases where a clinical camera needs to push to a central server across a hospital Wi-Fi, e-learning recording rigs that contribute from classrooms over commodity broadband, and surveillance camera installations that aggregate field encoder feeds to a central recording origin. The pattern that has held up across every vertical is the same: when the contribution path is anything other than a private fibre line, SRT outperforms RTMPS and is worth the configuration work to get right. When we build a contribution-side stack for a client, the default we ship is SRT with RTMPS as a fallback for legacy encoders.

SRT Versus RIST Versus WHIP — How To Choose

SRT is not the only modern contribution protocol; it is the most widely deployed one. For a complete picture, compare it briefly against its two main competitors. The full comparison lives in Picking an ingest protocol in 2026, but the short version follows.

RIST (Reliable Internet Stream Transport) is the broadcast-industry's standards-body answer to the same problem SRT solves. The SMPTE technical reports TR-06-1, TR-06-2, and TR-06-3 specify it. RIST profiles are Simple, Main, and Advanced; Simple is roughly equivalent to early SRT (UDP plus selective retransmission), Main adds encryption and authentication, Advanced adds tunnelling and link bonding. RIST's value is that it is an open standard published by an established body (SMPTE) rather than a vendor-led project, which matters to broadcast organisations whose compliance regimes prefer standards-body documents. RIST adoption is real but smaller than SRT's; in the 2024 Haivision Broadcast Transformation Report, RIST was used by 22 percent of broadcasters compared to SRT's 68 percent. Choose RIST over SRT if your compliance regime mandates a SMPTE standard or if you are integrating with a broadcast-industry pipeline that is already RIST-native.

WHIP (WebRTC-HTTP Ingestion Protocol, RFC 9725) is the new-school answer aimed at sub-second contribution. WHIP is WebRTC ingest dressed up as a simple HTTP POST, which means the wire transport is WebRTC's SCTP-over-DTLS-over-UDP machinery and the latency budget is the WebRTC budget (typically 200 ms to 1 second of glass-to-glass). Choose WHIP over SRT when you need sub-second contribution and you are willing to accept WebRTC's CPU cost and the developer-platform complexity. Stay with SRT when the contribution-side hardware is a traditional broadcast encoder that does not speak WebRTC, or when the contribution path needs to traverse satellite or extreme-latency conditions where the WebRTC jitter buffer cannot stretch wide enough.

The decision in one sentence: SRT for professional contribution at 200 ms to several seconds of latency on a lossy public-internet path; RIST when SMPTE standards compliance is the constraint; WHIP when sub-second latency on a clean path is the constraint.

A decision flow diagram for choosing between SRT, RIST, and WHIP in 2026 for a contribution path. The diagram has a starting node labelled Figure 3. The three-question decision tree most teams should walk through before shipping or migrating a contribution protocol in 2026 where sub-second latency or strict compliance is not required, SRT is the default answer.

What To Read Next

CTA

Talk to a streaming engineer · See our case studies · Download the SRT contribution checklist (PDF)

References

  1. draft-sharabayko-srt-01 — The SRT Protocol, IETF Internet-Draft, September 2021 (expired March 2022). The most complete public specification of the SRT protocol, covering the handshake, encryption, NAK-based retransmission, and three operating modes. (tier 1, official spec — note that the draft expired without progressing to RFC; the controlling reference is now the union of this draft and the Haivision reference implementation).
  2. SRT Reference Implementation, Haivision/srt v1.5.4, GitHub source repository and documentation, retrieved May 2026. The canonical implementation; the documentation (docs/) is the de-facto specification for behaviour not explicitly covered in the IETF draft. (tier 2, reference implementation; per §4.3.2, ranks below the spec but ahead of vendor blogs).
  3. SRT Alliance Deployment Guide, v1.1, SRT Alliance, retrieved May 2026. The Alliance's consolidated guidance on the 4× RTT latency rule, the caller/listener/rendezvous modes, and the AES encryption configuration. (tier 3, first-party engineering documentation from the spec's authors).
  4. draft-sharabayko-srt-over-quic-00 — Tunnelling SRT over QUIC, IETF Internet-Draft, July 2022. The proposal to carry SRT inside QUIC datagrams; cited as the protocol-evolution direction even though it has not yet shipped widely. (tier 1, draft specification — flagged as subject to revision).
  5. SMPTE TR-06-1, TR-06-2, TR-06-3 — Reliable Internet Stream Transport (RIST) Protocol Specification, SMPTE Technical Reports, 2018–2022. The SMPTE standards series used to compare SRT against the broadcast-industry alternative. (tier 1, SMPTE standards, paywalled — the article uses the SMPTE catalogue entries and DASH-IF-style implementation notes as the open companion).
  6. Haivision 2024 Broadcast Transformation Report, Haivision Inc., 2024. The industry survey that produced the 68 percent SRT adoption figure cited in the introduction. (tier 4, vendor-produced industry survey).
  7. SRT Alliance Surpasses 600 Members and Welcomes Paramount, Cloudflare, Dolby.io, Chyron, JW Player, THEO Technologies, and EVS, Haivision press release, 2024. The membership data point cited in the history section. (tier 4, vendor press).
  8. OBS Studio Knowledge Base — SRT Protocol Streaming Guide, OBS Project, retrieved May 2026. The encoder-side configuration reference cited in the platform-support table and the worked-example URI structure. (tier 4, reference-implementation documentation).
  9. BELABOX/srtla — SRT Transport Proxy with Link Aggregation for Connection Bonding, BELABOX project, GitHub source repository and documentation, retrieved May 2026. The canonical SRTLA implementation cited in the SRTLA section. (tier 4, open-source reference implementation for the SRTLA extension).
  10. AWS Elemental MediaConnect — Source Setup with SRT Listener, AWS documentation, retrieved May 2026. The cloud-side platform reference for the worked example. (tier 4, vendor documentation for a tier-1 cloud platform).
  11. Examining SRT Streaming over 4G Networks, Maxim Sharabayko, Innovation Labs Blog (Medium), retrieved May 2026. The protocol author's own analysis of latency tuning and overhead bandwidth on cellular paths. (tier 3, engineering blog from the spec's author).
  12. SRT Cookbook — Projects and Applications with SRT Support, Haivision SRT Lab, retrieved May 2026. The ecosystem inventory cited in the platform-support table. (tier 4, reference-implementation companion).