An MCU performs the decode-mix-encode pipeline that an SFU avoids. Each participant uploads their stream to the MCU; the MCU decodes all streams, composes them into a single mixed video (typically a grid layout) plus a mixed audio track, encodes the result, and sends one encoded stream to each participant. From the participant's perspective the experience is identical to receiving a single TV broadcast — no client-side composition needed.

The advantage of MCU is bandwidth efficiency on the receiver side: each participant downloads one stream regardless of how many are speaking. This used to matter on the early-2000s narrowband connections where MCU originated. The disadvantage is CPU cost on the server — decoding and re-encoding 30 HD streams at 30 fps requires significant horsepower, scaling cost linearly with participants × time.

In 2026, MCUs are rare in pure WebRTC. SFUs dominate because they have much lower server CPU cost and let receivers compose layouts locally. MCUs persist in legacy SIP/H.323 video-conferencing systems, in services that need to record a single composed stream (one MP4 file with the grid), and in services that need to stream a video conference out to non-WebRTC platforms (where HLS/DASH receivers need a single composed stream). Some platforms offer MCU as a recording-only adjunct to their main SFU.