Published 2026-06-01 · 18 min read · By Nikolay Sapunov, CEO at Fora Soft

Why this matters

If you just want a tidy background on your next call, the first section answers that in under a minute, for every app you are likely to use. If you build software — a product manager, founder, or engineer adding video to an app — the rest of the article explains the technology well enough that you can scope the feature, talk to your engineers about it, and avoid the traps that make a demo look broken. Background blur is the single most-requested camera feature in video calling, it runs entirely on the user's own device, and it is a clean introduction to the real-time AI that powers everything from virtual backgrounds to live captions. Understanding it once means understanding a whole family of features.

The fast answer: blur your background right now

Here is where the control lives in each app. None of these uploads your video anywhere — the blur happens on your own device, which matters for both speed and privacy, and we explain why later.

Zoom (desktop). Open the Zoom app, click your profile picture, then Settings (the gear icon). Choose Background & Effects in the left sidebar, then click Blur under Virtual Backgrounds. To do it mid-call, find the Stop Video button at the bottom-left, click the small ^ arrow beside it, and pick Blur My Background. Background blur is free on every Zoom plan, and needs Zoom version 5.5.0 or newer.

Zoom (phone or tablet). In a meeting, tap More (the three dots), then Background & Effects on iPhone or iPad — or Virtual Background on Android — and tap the Blur icon. It applies instantly.

Microsoft Teams. On the screen that previews your camera before you join, click the Background filters button (a person with a small scene behind them) under the preview, then choose Blur. Teams offers two strengths: Standard blur and Portrait blur. Already in the meeting? Click More (…), then Effects and avatars, then Blur, then Apply. On Windows you can toggle it with Ctrl+Shift+P; on Mac, Cmd+Shift+P.

iPhone and iPad (FaceTime and any video app). During a FaceTime call, tap your own video tile, then tap Portrait. To turn it on for any video app — Zoom, Teams, Webex, a browser — open Control Center, tap Video Effects, and turn on Portrait; it stays on across apps until you switch it off. This works on iPhone and iPad models with Apple's A12 Bionic chip or newer (that means iPhone XS and later), running a current version of iOS or iPadOS.

iPhone as a Mac webcam (Continuity Camera). When your iPhone is acting as your Mac's camera, open Control Center on the Mac, click Video Effects, and choose Portrait. This is a useful trick: native Portrait blur on the Mac's own camera needs an Apple Silicon Mac, but routing it through an iPhone brings the same blur to older Intel Macs too.

Google Meet. In a call, click More (⋮), then Apply visual effects, and pick Blur — slight or full.

Quick-reference matrix showing where the background-blur control lives in Zoom, Microsoft Teams, FaceTime on iPhone, Continuity Camera on Mac, and Google Meet, with the menu path, the keyboard shortcut where one exists, and the minimum hardware requirement for each. Figure 1. Where the blur toggle lives in each app, and what each one needs to run. Save this; it is the whole how-to on one screen.

That is the whole how-to. The rest of this article explains what actually happens when you tap that button — and how to build the same thing yourself.

What "blurring the background" really means

It sounds like the app is smudging part of the picture, and visually that is the result. But the hard part is not the blur. The hard part is the question that comes first: which pixels are the background, and which pixels are you?

Picture your video frame as a grid of tiny colored squares — the brightness-and-color dots called pixels that make up every digital image. To blur only the background, the software has to label every single one of those squares as either person or not person, on every frame, thirty times a second. That labeling job is called segmentation — splitting the image into meaningful regions. The specific kind used here is person segmentation: separate the human from everything else.

The output of segmentation is a second, simpler image the same size as your video, called a mask (engineers also call it a matte). In the mask, every pixel that belongs to you is marked white, and every pixel that belongs to the room is marked black. Think of it as a stencil cut out in the exact shape of your body and hair. Once the software has that stencil, the rest is easy: keep the original sharp pixels wherever the stencil is white, and paint blurred pixels wherever it is black.

So background blur is really two steps wearing one button: first find the person (the AI step), then blur everything else (the easy step). A green screen does the same job with a colored cloth and a lot of studio lights. Background blur throws away the cloth and asks a neural network to find your outline instead.

Three-stage pipeline diagram. Stage one shows the raw camera frame of a person in a room. Stage two shows a person-segmentation model producing a black-and-white mask in the shape of the person. Stage three shows the final composite where the person stays sharp and the background is blurred, with an arrow noting that only the black mask region receives blur. Figure 2. Background blur in three stages: capture the frame, segment the person into a mask, then blur only the masked-out background and composite you back on top.

How the app finds you: the segmentation model

The "find the person" step is done by a small neural network — a piece of software trained on hundreds of thousands of labeled photos until it learned the visual pattern of "human in front of a camera." You do not program the rules ("arms are cylindrical, hair is fuzzy"); you show the network enough examples and it learns the rules itself.

The most widely used model for this in browsers and apps is Google's MediaPipe Selfie Segmentation. It is deliberately tiny. The model has roughly 106,000 internal numbers (engineers call them parameters) and the whole file is about 454 kilobytes — smaller than a single photo. It is built on a compact image-recognition design called MobileNetV3, chosen because it was made to run fast on phones rather than on data-center hardware. The model shrinks your video frame down to a small square — 256 pixels by 256 pixels — does its work there, and scales the resulting mask back up. A smaller input means less math, and less math means it finishes in time.

Why 256 by 256 and not the full frame? Because the mask does not need to be razor-sharp to look good once it is blurred — the blur itself hides small errors at the edge. Running on a shrunk-down frame is the trade that makes real-time blur possible on a phone.

The 33-millisecond rule that governs everything

Here is the constraint that shapes every design decision in this feature. Video plays at about 30 frames per second — 30 still images shown in quick succession, the way a flipbook makes drawings move. If you do the division out loud:

1 second ÷ 30 frames = 0.0333 seconds per frame
0.0333 seconds × 1000 = 33.3 milliseconds per frame

So a new frame arrives roughly every 33 milliseconds (a millisecond is one-thousandth of a second). Both steps — find the person, then blur the background — have to finish inside that window. If the work takes longer than 33 milliseconds, the app cannot keep up: it either drops frames, making your video stutter, or it falls behind, making you look laggy.

This is why the chip doing the work matters so much. The same MediaPipe model finishes a frame in under 3 milliseconds when it runs on a GPU — the graphics processing unit, the chip originally built to draw game graphics, which does thousands of small calculations at the same time. Run the identical model on the general-purpose CPU instead and it can take 90 to 120 milliseconds — three to four times the entire frame budget. Same model, same picture; the only difference is which chip does the arithmetic. That gap is the whole reason these features feel smooth on a modern device and choppy on an old one.

A worked example makes the budget concrete. Suppose your segmentation step takes 4 milliseconds and your blur step takes 6 milliseconds:

4 ms (segment) + 6 ms (blur) = 10 ms total
33 ms budget − 10 ms used = 23 ms to spare  → smooth

Now suppose the same work runs on a weak CPU and segmentation alone takes 95 milliseconds:

95 ms (segment) > 33 ms budget  → every third frame is dropped

The math is unforgiving, and it is the same math whether you are using Zoom or building your own app. To learn how this fits into the broader timing of a live call, see our sub-100ms real-time latency budget breakdown.

Two ways to know what is "background": flat vs. depth

Now the interesting part — and the reason your iPhone's blurred background usually looks better than your laptop's. There are two fundamentally different ways to decide what counts as background, and the major apps split between them.

The first way is segmentation only, working from a single flat image. The model looks at colors, shapes, and edges and guesses where you end and the room begins. It has no idea how far away anything is; it only knows "this looks like a person, that looks like a wall." Zoom, Microsoft Teams, and Google Meet all work this way, because they have to run on any ordinary webcam, which captures a flat picture with no distance information.

The second way adds depth. Your iPhone's Portrait effect does not just recognize your shape — it measures how far away each part of the scene is, then blurs things by distance, the way a real camera lens does. The phone gets this distance information from having more than one camera lens (the slight difference between two viewpoints reveals depth, the same way your two eyes do) and, on Pro models, from a LiDAR sensor that actively measures distance. Apple combines that depth map with a person-segmentation matte computed on its Neural Engine — a chip inside the phone dedicated to AI math — to produce a blur that gets gradually stronger as the background recedes.

That gradual falloff is why Portrait blur looks "cinematic." A real lens does not blur the background uniformly; objects just behind you are slightly soft, and the far wall is very soft. Photographers call the size of each out-of-focus point the circle of confusion — the farther a point is from the focus plane, the bigger and softer its blurred disc. Depth-based blur reproduces that. Flat segmentation blur cannot: with no distance data, it blurs the entire background by the same fixed amount, which is why it can look like a sticker of you pasted on frosted glass.

Segmentation-only blur Depth-based blur
Used by Zoom, Teams, Google Meet, most webcams iPhone / iPad Portrait, Continuity Camera
How it decides ML guesses person vs. background from a flat image Measures distance per pixel, then blurs by distance
Hardware it needs Any single camera Multiple lenses or a LiDAR depth sensor
Blur character Uniform; whole background equally soft Graduated; nearer objects sharper, far ones softer
Typical weak spot Halos and smearing at hair and glasses Needs Apple hardware; less portable

Table 1. The two engineering approaches to background blur, and why iPhone Portrait usually looks more natural than webcam blur.

What each app actually does under the hood

The apps differ in interesting ways once you look past the button.

Google Meet is the most openly documented because it runs inside a web browser, where nothing is hidden. Google's engineers run the segmentation model on the CPU — not the GPU — on purpose, using a library called XNNPACK that squeezes maximum speed out of the processor through WebAssembly (a way to run near-native-speed code in a browser). They chose the CPU for the widest device coverage and lowest battery drain. The model produces a low-resolution mask, refines its edges to line up with the real image, and then hands the blur itself to the GPU through WebGL2 (a browser graphics interface). Their blur shader deliberately imitates a lens: it varies the blur strength pixel by pixel in proportion to the mask, and weights pixels so the sharp foreground does not "bleed" a halo into the soft background.

Apple leans entirely on dedicated hardware. The Neural Engine produces a high-quality person matte frame by frame, fast enough to hold a smooth 60 frames per second during a live call. Because the work runs on a chip purpose-built for it rather than on a general GPU shader, it drains less battery — which is why Portrait blur barely dents an iPhone's runtime. Apple has been building this matte technology since 2019, when it first exposed segmentation mattes that could separate not just a person but their hair, skin, and teeth.

Zoom and Microsoft Teams ship their own native segmentation models tuned for desktops and phones. The clearest engineering fingerprint they leave is their hardware requirement: Teams needs a processor that supports AVX2, a set of CPU instructions that accelerate the kind of bulk math segmentation needs. That requirement is exactly why background blur silently fails to appear on some older or virtualized machines — the chip cannot do the math fast enough, so the feature hides itself rather than stutter.

Common pitfall: the "is my background being uploaded?" worry — and its opposite. Every blur described here runs on your own device; your raw camera frames are not sent to a server to be processed. That is good for privacy. But it creates a trap for builders: blur is cosmetic, not secure. The unblurred frame still exists in memory on the device, and a poorly built app can leak a sharp frame for a split second when the camera turns on before the model loads. If your product blurs for privacy reasons — telemedicine, legal, HR — you must hold the first frames back until the mask is ready, not assume the blur is instant.

Build it into your own product

If you are adding background blur to your own web-based video product, the browser now gives you every piece you need, and the architecture mirrors the three stages from Figure 2. We cover the model choice in depth in our MediaPipe Selfie Segmentation with WebGPU lesson; here is the shape of the whole pipeline.

You tap into the camera stream at the raw-frame point — the place in the pipeline where each frame is still an uncompressed image with real pixels. The browser exposes this through two objects: MediaStreamTrackProcessor, which hands you each incoming camera frame, and MediaStreamTrackGenerator, which lets you push your modified frame back into the call. Between them sits your transform: run the frame through a segmentation model to get the mask, then composite a blurred copy with the sharp original using the mask as the stencil.

// Pull each camera frame, segment, blur the background, push it back.
const processor = new MediaStreamTrackProcessor({ track: cameraTrack });
const generator = new MediaStreamTrackGenerator({ kind: "video" });

const transformer = new TransformStream({
  async transform(frame, controller) {
    const mask = await segmenter.segment(frame);   // person vs. background
    const output = composite(frame, blur(frame), mask); // sharp you, soft room
    controller.enqueue(output);
    frame.close();                                  // release memory every frame
  },
});

processor.readable.pipeThrough(transformer).pipeTo(generator.writable);
const blurredStream = new MediaStream([generator]);  // send this into the call

For the segmentation model, Google's MediaPipe Image Segmenter ships a ready-to-use JavaScript build with the same Selfie Segmentation model the big apps use. For the blur-and-composite step, do it on the GPU: either WebGL2, as Google Meet does, or the newer WebGPU, which became available by default across Chrome, Edge, Firefox, and Safari in late 2025. Doing the blur on the CPU is the single most common reason a homemade version stutters.

Three mistakes wreck most first attempts. Edge halos: the mask is never pixel-perfect at hair and glasses, so a hard cutoff leaves a glowing fringe — feather the mask edge by a few pixels so the transition is gradual. Temporal flicker: because the model runs fresh on every frame, the mask edge wobbles frame to frame and the outline shimmers — blend each mask slightly with the previous one to steady it. Memory leaks: each VideoFrame holds real memory and must be explicitly closed every frame, or the tab crashes within minutes. Get those three right and a browser-based blur is genuinely production-grade.

Browser data-flow diagram for building background blur. The camera track enters a MediaStreamTrackProcessor, which feeds frames to a transform stage running a MediaPipe segmentation model and a WebGPU blur-and-composite step, which outputs through a MediaStreamTrackGenerator back into the WebRTC call. Annotations mark the three common failure points: edge halos, temporal flicker, and unclosed frames. Figure 3. The browser pipeline for building your own blur: camera in, segment, blur on the GPU, composite, stream out — with the three mistakes that break it flagged at the points where they happen.

Where the blur sits in the bigger picture

Background blur is the gentlest member of a large family of real-time camera-AI features that all share its two-step shape: understand the frame, then change it. Virtual backgrounds use the same mask but paint an image where the blur would go. Beauty filters and gaze correction reshape the foreground instead of the background. Live captions run a different model on the audio rather than the video. Every one of them lives at the same raw-frame point in the pipeline and answers to the same 33-millisecond clock. Learn the blur and you have learned the pattern. The video SDK you build on decides whether you can reach that raw-frame point at all — a choice we unpack in our WebRTC AI APIs and video SDK comparison.

There is also a regulatory edge worth knowing. Because blur relies on detecting a person, it is technically a face-and-body detector, and where it shades into recognition — identifying who the person is — European rules begin to apply. Blur itself does not identify anyone, so it stays clear, but teams adding more analysis on top should read our note on face detection under the EU AI Act before they ship.

Where Fora Soft fits in

We have built real-time video products since 2005 — video conferencing, WebRTC apps, e-learning, telemedicine, and live surveillance — and on-device camera effects like background blur are among the most-requested features our clients ask for. The work is rarely the blur itself; it is making the blur hold steady across a thousand different cameras, phones, and lighting conditions without draining the battery or leaking a sharp frame at the wrong moment. In telemedicine especially, where a blurred background protects a patient's home, the difference between a cosmetic blur and a privacy-grade one is an engineering decision made early. We help teams pick the model, the chip, and the pipeline that fit the product, so the feature that looks simple in a demo still works on a five-year-old phone in a poorly lit room.

What to read next

Talk to us / See our work / Download

  • Talk to a video engineer — bring us the camera-AI feature you want to ship and we will help you choose the model, chip, and pipeline: /services/webrtc-development
  • See our work — real-time video products we have shipped since 2005: /portfolio
  • Download the Background Blur Quick-Start & Engineering Cheat Sheet — one page with the per-app steps, the hardware requirements, the segmentation-vs-depth comparison, the 33 ms budget, and the build-it-yourself checklist: Download the cheat sheet

References

  1. W3C, "MediaStreamTrack Insertable Media Processing using Streams" (Working Draft) — the raw-frame insertion point (MediaStreamTrackProcessor / MediaStreamTrackGenerator) that exposes each camera frame as a VideoFrame for on-device processing. https://www.w3.org/TR/mediacapture-transform/
  2. W3C, "Media Capture and Streams" (Recommendation)getUserMedia and the MediaStreamTrack model that camera-effect pipelines build on. https://www.w3.org/TR/mediacapture-streams/
  3. W3C, "WebGPU" (Working Draft, GPU for the Web WG) — modern browser GPU compute and graphics; reached default availability across Chrome, Edge, Firefox, and Safari 26 in late 2025; the fast path for the blur-and-composite step. https://www.w3.org/TR/webgpu/
  4. W3C, "WebCodecs" (Working Draft) — low-level frame encode/decode access that pairs with the raw-frame API. https://www.w3.org/TR/webcodecs/
  5. W3C, "WebRTC: Real-Time Communication in Browsers" (Recommendation, 13 March 2025) — the live-call platform the blurred stream is sent over. https://www.w3.org/TR/webrtc/
  6. Google Research, "Background Features in Google Meet, Powered by Web ML" (2020) — first-party account: MediaPipe + XNNPACK CPU inference, WebGL2 rendering, and a circle-of-confusion blur shader that weights pixels to prevent foreground bleed. https://research.google/blog/background-features-in-google-meet-powered-by-web-ml/
  7. Google Research, "High-Definition Segmentation in Google Meet" (2022) — the higher-resolution successor model and edge-refinement approach. https://research.google/blog/high-definition-segmentation-in-google-meet/
  8. Google for Developers, "MediaPipe Selfie Segmentation" / "Image Segmenter" docs — MobileNetV3-based model, 256×256 and 144×256 variants, ~106K parameters, ~454 KB; web build with sub-3 ms GPU-delegate inference. https://ai.google.dev/edge/mediapipe/solutions/vision/image_segmenter/web_js
  9. Apple Developer, WWDC19 Session 260, "Introducing Photo Segmentation Mattes" — on-device person, hair, skin, and teeth mattes from AVCapture and Core Image; the matte technology behind Portrait effects. https://developer.apple.com/videos/play/wwdc2019/260/
  10. Apple Support, "Change FaceTime video settings on iPhone" — how to enable Portrait mode in FaceTime and via Control Center Video Effects; A12 Bionic (iPhone XS) or later, iOS 15+. https://support.apple.com/guide/iphone/change-the-video-settings-iphfb3d2a12b/ios
  11. Apple Support, "Use your iPhone as a webcam for Mac" (Continuity Camera) — Portrait via Mac Control Center Video Effects; requirements (iPhone XR+ for Continuity Camera; Apple Silicon for native Mac Portrait). https://support.apple.com/en-us/102546
  12. Microsoft Support, "Change your background in Microsoft Teams meetings" — Background filters → Blur (Standard / Portrait); Ctrl+Shift+P / Cmd+Shift+P; AVX2 CPU requirement; not supported on Linux or VDI. https://support.microsoft.com/en-us/teams/meetings/change-your-background-in-microsoft-teams-meetings
  13. Zoom Support, "Blur your background" — Settings → Background & Effects → Blur, and in-meeting toggle; Zoom 5.5.0+; free on all plans; mobile minimums. https://support.zoom.com/hc/en/article?id=zm_kb&sysparm_article=KB0060387