
Key takeaways
• Picture-in-Picture (PiP) is a retention feature, not a gimmick. Products that ship PiP see 8–15% higher average session duration and a measurable lift in D7 retention on mobile-first video apps.
• The web PiP story is now two APIs. The classic Picture-in-Picture API (video element only) is universally supported; the newer Document PiP API (arbitrary DOM in a PiP window) is shipping in Chromium-based browsers and changing what is possible for meetings, editors, games.
• iOS PiP is the hardest surface. App Store review, background audio capabilities, AVPlayerLayer vs AVPictureInPictureController, and the WebRTC PiP story for real-time video each have their own traps.
• Android is the cleanest of the three. The PiP Activity mode plus PictureInPictureParams covers most cases in 30–60 engineering hours. ExoPlayer integrates natively.
• Your PiP scope has three tiers. Tier 1: passive video playback PiP (2–3 weeks). Tier 2: PiP for WebRTC calls and live streams (3–5 weeks per platform). Tier 3: document PiP / custom-canvas PiP with controls and overlays (4–6 weeks on web only).
Why Fora Soft wrote this playbook
Fora Soft has shipped PiP for 21 years of video products — telehealth, live streaming, video conferencing, OTT, education, surveillance, courtroom, fitness. Our engineers have hit every iOS App Store rejection, every Android OEM quirk, every Safari incompatibility, and every WebRTC-PiP dance on the main browsers. This playbook distills what we tell clients in Week 1 of any PiP-heavy project: which APIs to use, what it costs, where it breaks, and when to say no.
For the iOS-specific deep dive see our companion article, Picture-in-Picture on iOS: implementation and peculiarities. For the broader product context see our video streaming development services.
Need PiP on web, iOS and Android in one sprint?
A 30-minute call with a Fora Soft engineer will scope your PiP work on all three platforms, including WebRTC, in a single pass.
What PiP actually is, in 2026
Picture-in-Picture lets a video element detach from its host window, app or browser tab and float in a small always-on-top window while the user does something else. First shipped on Android (2017) and Safari 10 (macOS, 2016), it is now supported across every major consumer platform. The 2026 state of play:
| Platform | Video element PiP | Document / DOM PiP | Notes |
|---|---|---|---|
| Chrome / Edge / Opera | Yes, since 70. | Yes, Document PiP stable. | Full controls, auto-PiP on tab switch. |
| Safari macOS | Yes, since macOS 10. | Partial (behind flag). | Vendor-prefixed API historically; standard API since Safari 16. |
| Firefox | Yes, since 69. | No. | Uses its own PiP toggle UI; no full Web API. |
| Android | Yes, via Activity PiP (O+). | n/a (native) | ExoPlayer integrates trivially; OEM quirks exist. |
| iOS | Yes, via AVPictureInPictureController (iOS 9+). | n/a (native) | Background mode “audio” is required; review-prone. |
| iPadOS | Yes, native system chrome (iPadOS 14+). | n/a (native) | Slide Over and Stage Manager interplay on iPad. |
Reach for classic video-element PiP when: you are shipping VOD or simple live playback. Reach for Document PiP (Chromium) when you need custom controls, overlays, chat, whiteboards or editor UI inside the PiP window.
Web implementation: video-element PiP in 10 lines
For a standard HTML5 video, PiP is a one-method API on the video element. This is the baseline every OTT product should ship.
const video = document.getElementById('player');
// Feature detection first.
if (document.pictureInPictureEnabled && !video.disablePictureInPicture) {
pipBtn.addEventListener('click', async () => {
try {
if (document.pictureInPictureElement) {
await document.exitPictureInPicture();
} else {
await video.requestPictureInPicture();
}
} catch (err) {
console.warn('PiP failed:', err);
}
});
// React to state changes.
video.addEventListener('enterpictureinpicture', e => {
// Shrink full UI, hide the main player, track the event.
});
video.addEventListener('leavepictureinpicture', e => {
// Restore full UI.
});
}
Two features you should also wire up. (1) navigator.mediaSession so pause/seek/next show up in the PiP chrome and on the OS media-control surfaces. (2) disablePictureInPicture on ads or DRM-forbidden frames where you do not want PiP.
Document PiP: the game-changer nobody is using yet
Classic PiP ships a bare video element. Document PiP (aka Document Picture-in-Picture API, stable in Chromium since 2023) ships any DOM you want — the player, a chat sidebar, a whiteboard, controls, an overlay. This is what turns PiP from “watch while doing email” into “miniaturized app instance”.
if ('documentPictureInPicture' in window) {
const pipWindow = await documentPictureInPicture.requestWindow({
width: 320, height: 180
});
// Copy stylesheets so the PiP window looks like the host.
[...document.styleSheets].forEach(sheet => {
try {
const css = [...sheet.cssRules].map(r => r.cssText).join('');
const style = pipWindow.document.createElement('style');
style.textContent = css;
pipWindow.document.head.appendChild(style);
} catch { /* cross-origin, ignore */ }
});
// Move the live player into the PiP window.
pipWindow.document.body.append(document.getElementById('live-call'));
pipWindow.addEventListener('pagehide', () => {
// Return the node to the main window.
document.body.append(document.getElementById('live-call'));
});
}
Where this earns its keep. Video meetings (Zoom-style gallery, chat visible in the PiP), live-commerce (product panel + stream), collaborative editors (document + presence indicator), coding (tests running alongside the editor). Document PiP is the only way to keep a WebRTC grid view interactive without leaving the tab.
Reach for Document PiP when: you need controls, chat, overlays or any interactive UI inside the PiP window and Chromium is acceptable. Fall back to classic PiP on Firefox and older Safari.
Android PiP: Activity mode + PictureInPictureParams
Android PiP is an Activity-level mode available since Oreo (API 26). You declare support, expose aspect ratio and optional remote actions (play/pause/next), then call enterPictureInPictureMode(). ExoPlayer needs no special handling.
// AndroidManifest.xml
<activity android:name=".PlayerActivity"
android:supportsPictureInPicture="true"
android:configChanges="screenSize|smallestScreenSize|screenLayout|orientation"
android:launchMode="singleTask" />
// PlayerActivity.kt
fun onUserLeaveHint() {
super.onUserLeaveHint()
if (packageManager.hasSystemFeature(PackageManager.FEATURE_PICTURE_IN_PICTURE)
&& player.isPlaying) {
val params = PictureInPictureParams.Builder()
.setAspectRatio(Rational(16, 9))
.setActions(buildPipActions())
.setAutoEnterEnabled(true) // Android 12+
.setSeamlessResizeEnabled(true)
.build()
enterPictureInPictureMode(params)
}
}
Gotchas. Activity must be singleTask. Handle onPictureInPictureModeChanged to swap in a minimal UI. On some OEM skins (Xiaomi, Huawei), users have to enable “PiP permission” per app — surface a helper on first use. For deeper Android video performance tips see our 10 ways to optimize Android apps for video streaming.
iOS PiP: AVPictureInPictureController, audio capability, review traps
iOS PiP is built on AVPictureInPictureController. Wiring it up is 15 lines; shipping it through App Review without surprises is the hard part.
// Info.plist: UIBackgroundModes = ["audio"]
// AVPlayer + AVPlayerLayer are the normal path; for custom content use
// AVPictureInPictureVideoCallViewController (iOS 15+) or a sampled CALayer.
import AVKit
let playerLayer = AVPlayerLayer(player: player)
playerLayer.videoGravity = .resizeAspect
view.layer.addSublayer(playerLayer)
guard AVPictureInPictureController.isPictureInPictureSupported() else { return }
let pipController = AVPictureInPictureController(playerLayer: playerLayer)!
pipController.delegate = self
pipController.canStartPictureInPictureAutomaticallyFromInline = true
// Activate the audio session for PiP playback.
try AVAudioSession.sharedInstance().setCategory(.playback)
try AVAudioSession.sharedInstance().setActive(true)
The four traps we repeatedly debug. (1) Apps that enable the “audio” background mode but do not actually play audio in the background get rejected — you must legitimately continue audio for the user. (2) Returning from PiP requires restoring the player into your view hierarchy; forgetting to do this leaves the PiP window alive but disconnected. (3) For WebRTC calls, use AVPictureInPictureVideoCallViewController (iOS 15+); AVPlayerLayer cannot render RTCVideoTrack directly. (4) On iPad, Slide Over and Stage Manager behaviour can confuse first-time users — document it.
For more iOS-specific detail see Picture-in-Picture on iOS: implementation and peculiarities.
PiP for WebRTC calls: the hard mode
The most common PiP request in 2026 is “put the active speaker of a video meeting in PiP when the user opens another tab/app”. Every platform has a different answer.
Web. Either render the RTCVideoTrack into a <video> element fed by a MediaStream and use classic PiP, OR use Document PiP and move the entire call UI into a mini-window. The latter is strictly better for meetings because you keep controls.
Android. ExoPlayer-like PiP works for RTP only if you feed it via a custom SurfaceView. The cleaner path is running the call UI in a single Activity and flipping to PiP mode with setAutoEnterEnabled(true); the WebRTC SurfaceViewRenderer renders inside PiP just fine.
iOS. Use AVPictureInPictureVideoCallViewController (iOS 15+). Implement a AVPictureInPictureSampleBufferDisplayLayer and push CMSampleBuffers from the WebRTC video track. Budget 25–45 hours for a production-grade implementation. Our WebRTC architecture guide has the broader context.
Bringing PiP to a video-call app?
We have shipped PiP for WebRTC calls on iOS, Android and web. Skip the trial-and-error and get a working implementation in a sprint.
PiP in React Native, Flutter, and hybrid stacks
React Native. react-native-pip-android, react-native-pip, and the react-native-video built-in PiP prop. For WebRTC, react-native-webrtc + a custom bridge to AVPictureInPictureVideoCallViewController on iOS and the standard Activity PiP on Android.
Flutter. floating and simple_pip_mode wrap Android PiP cleanly; iOS requires a platform channel to AVPictureInPictureController. Flutter-WebRTC PiP is feasible but expect 2–3 extra days for iOS glue.
Capacitor / Cordova / web wrapper hybrids. Web PiP just works; for native PiP you need a small native plugin per platform.
UX rules that make PiP actually useful
1. Opt-in for auto-PiP. Unsolicited PiP on tab switch feels like spyware. Show a setting, default off.
2. A single PiP button, where users already look. On video players, bottom-right next to fullscreen. Do not reinvent.
3. Keep controls inside PiP. Play/pause, seek, skip-next, mute. On web use mediaSession; on iOS/Android the system chrome handles it automatically.
4. Restore state when PiP ends. Seek position, playing/paused, volume, mute, fullscreen — all should round-trip.
5. Respect DRM / ads. Disable PiP on unskippable ads and DRM-forbidden windows; set disablePictureInPicture.
6. Accessibility. The PiP toggle button needs a proper aria-label, keyboard shortcut (P is common), and a visible focus ring.
Tooling and libraries we reach for
Web players with built-in PiP. Video.js, Shaka Player, HLS.js, Bitmovin Player, THEOplayer, JW Player. All support classic PiP out of the box; Bitmovin and THEOplayer lead on Document PiP integration.
Android. AndroidX Media3 (ExoPlayer) plus a 20-line PictureInPictureParams builder. For WebRTC, the official org.webrtc:google-webrtc SurfaceViewRenderer.
iOS. AVKit, AVFoundation, AVPictureInPictureVideoCallViewController (iOS 15+), AVPictureInPictureSampleBufferDisplayLayer. Third-party: MobileVLCKit, SwiftAudio.
React Native. react-native-video, react-native-pip-android, react-native-webrtc + small native bridges.
Testing matrix. At minimum: Chrome 120+, Safari 16+/17+, Firefox 120+, Android 8/11/14 on Samsung + Xiaomi + Pixel, iOS 15/17 on iPhone + iPad. BrowserStack / Sauce Labs / LambdaTest for cloud.
Analytics hooks you need from day one
You cannot optimise what you cannot measure. Instrument six events and tag them with platform, device class and session ID:
1. pip_enter: when PiP activates, whether user-initiated or auto.
2. pip_exit: when PiP ends, with duration and reason (user-closed, navigation, end-of-video, error).
3. pip_denied: API rejected entering PiP; log the error code to catch OEM quirks.
4. pip_control_click: pause/play/next from inside the PiP window.
5. pip_state_mismatch: when exit brings the player back into an inconsistent state; never should fire if your restore is correct.
6. pip_feature_seen: when the user saw the PiP button (for conversion analysis).
Mini case — PiP across an education platform in 4 weeks
Situation. An online-education client shipped Android + iOS + web, with lectures delivered over HLS and live Q&A over WebRTC. Students wanted to keep the lecture visible while taking notes in their favourite tool. Support ticket theme: “why does PiP work in YouTube and not in your app?”
The 4-week plan. Week 1: web classic PiP for VOD playback + Document PiP for WebRTC. Week 2: Android Activity PiP with ExoPlayer, auto-enter on home button. Week 3: iOS PiP via AVPlayerLayer for VOD, AVPictureInPictureVideoCallViewController for Q&A. Week 4: UX pass, analytics hooks, App Review resubmission with audio capability properly declared and used.
Outcome. Average session length rose 11% in month 1 after launch; D7 retention among mobile-first users lifted 3 points; the “PiP”-related support tickets went to zero. Total engineering spend: ~145 hours across 2 mobile engineers plus ~25 hours web. Full spec in our streaming services catalogue.
What shipping PiP actually costs
Estimates below assume our Agent Engineering-accelerated workflow. Non-accelerated teams should budget 40–60% more hours.
| Scope | Hours | Includes |
|---|---|---|
| Web (video element PiP) | 10–20 | Toggle, mediaSession, analytics. |
| Web (Document PiP) | 40–80 | DOM move/restore, style copy, lifecycle. |
| Android (Activity PiP) | 30–60 | Manifest, params, actions, OEM testing. |
| iOS (AVPlayerLayer PiP) | 30–50 | AVPIPController, audio session, review. |
| iOS (WebRTC call PiP) | 60–100 | VideoCallVC, SampleBufferDisplayLayer, pipeline. |
| Full cross-platform PiP | ~170–310 | All of above with QA on 10+ devices. |
Five pitfalls that sink PiP projects
1. Skipping iOS App Review due diligence. The “audio” background mode is inspected. Enable it, and justify it — your app must actually continue audio for the user in the background. Rejection cycle is 5–10 days.
2. Not testing OEM skins on Android. Xiaomi MIUI, Huawei EMUI and some Samsung builds require users to grant PiP permission manually. Surface a helper; log when permission is missing.
3. Losing player state across enter/exit. The most common bug we fix. Build a simple PlayerState observable and snapshot/restore on transitions.
4. PiP during ads. Users escape unskippable ads via PiP and resume watching content. Set disablePictureInPicture during ad slots.
5. Accessibility as an afterthought. aria-label on the toggle, keyboard shortcut, visible focus ring, announcements for screen readers when state changes. These are ADA / accessibility-audit findings waiting to happen.
KPIs: what to measure after shipping PiP
Quality. PiP enter success rate > 98%; exit-with-state-loss rate < 1%; average PiP window duration > 45 s (indicates users actually use it, not accidental).
Business. Average session length lift vs control cohort; D7 and D30 retention lift; feature-awareness (how many unique users used PiP at least once in 30 days).
Reliability. Platform crash-rate no worse than pre-PiP baseline; App Store / Play Store review sentiment no worse.
A decision framework — do you need PiP, and which tier
Q1. Is video or audio the core activity in the product? Yes → PiP is table stakes. No (video is peripheral) → skip.
Q2. Do your users multi-task on the same device during playback? Mobile surveys say yes for 60–75% of OTT and 40–55% of education users. Greater → higher PiP ROI.
Q3. Is your content DRM-gated or ad-heavy? Both reduce where PiP is allowed; scope narrows.
Q4. Is your product WebRTC-based? If yes, plan the call-PiP path (iOS VideoCallVC, Document PiP on web) from day one — retrofitting is 2–3x the cost.
Q5. How many platforms ship? Web-only → Tier 1 classic PiP is 2 weeks. Add iOS + Android → Tier 2 is 5–6 weeks. Add WebRTC call PiP → Tier 3 is 8–10 weeks total.
Reach for Tier 1 PiP (passive video element) when: you are shipping an OTT MVP or a simple live viewer. It is the highest ROI per engineering hour in the entire video-product feature set.
Rejected in App Review over PiP?
We have untangled dozens of App Store rejections around PiP and background audio. Book 30 minutes and we will tell you exactly what to change.
When you should not ship PiP
Video is incidental. If the product is “forms with a short explainer video”, PiP adds complexity with negligible retention lift.
Strict DRM that forbids off-primary-window playback. Some content licences contractually prohibit PiP playback; check with your legal team.
Audio-only products. Most OS platforms already provide PiP-equivalent lock-screen and media-control surfaces; implementing your own PiP UI is wasted effort.
Kids’ products. Parental-control regulators increasingly frown on features that keep video playing while children’s apps are backgrounded. Default off, hard switch in settings.
Reach for Document PiP when: the product is a live-collaboration, editor, meeting, or coding tool on web and you can scope for Chromium-first users while keeping a classic-PiP fallback.
FAQ
Does PiP work in Safari on iPhone, not just iPad?
Since iOS 14, yes — Safari on iPhone supports PiP for video elements. Earlier iOS versions did not. For in-app PiP on iPhone you still use AVPictureInPictureController.
Can I use PiP for my WebRTC video meetings?
Yes. On web, Document PiP is the preferred path because it keeps your controls. On iOS, use AVPictureInPictureVideoCallViewController (iOS 15+). On Android, plain Activity PiP with a SurfaceViewRenderer works.
Do I need background audio capability on iOS?
For PiP that continues playback when the user leaves your app, yes — add UIBackgroundModes with the “audio” value in Info.plist, and configure the AVAudioSession as .playback. App Review will verify that your app actually plays audio in the background, not just declares the capability.
Why is PiP not triggering on some Android phones?
Two common causes: (1) PiP permission is disabled in system settings (Xiaomi, Huawei) — surface a deep link to the settings screen. (2) The Activity is not singleTask or does not declare supportsPictureInPicture="true". Double-check the manifest.
What is Document PiP and should I care?
Document PiP is a newer Chromium API (stable since 2023) that lets you put arbitrary DOM — not just a video element — in the PiP window. If your product is a video meeting, live-commerce stream, collaborative editor or coding tool, yes, care — it is the biggest UX upgrade to PiP since launch. On Firefox and older Safari, fall back to classic PiP.
Does PiP affect video quality or bitrate?
Only in the sense that most players downgrade resolution because the window is small. For ABR streams (HLS/DASH) this happens automatically via the player’s bandwidth estimator. For WebRTC with simulcast, the SFU usually switches the subscriber to a lower simulcast layer.
Can I prevent PiP during ads?
Yes. On web, set disablePictureInPicture on the video element while an ad is playing. On iOS/Android, call canStartPictureInPictureAutomaticallyFromInline = false / exit PiP mode at ad-start. Flip the flag back on ad-end.
Is PiP a core feature of modern OTT apps?
It is now table stakes for mobile-first video products. Our OTT platform development playbook treats PiP as an MVP feature alongside captions and casting.
What to Read Next
iOS deep dive
PiP on iOS: Implementation and Peculiarities
The companion deep dive on the iOS-specific APIs, review gotchas and WebRTC PiP.
WebRTC
WebRTC Architecture Guide 2026
The broader context when PiP is a feature of a video-call product.
OTT playbook
OTT Platform Development in 2026
Where PiP fits into the modern OTT feature matrix and engineering budget.
Android
10 Ways to Optimize Android Apps for Streaming
Battery, memory and playback tuning that compounds with PiP.
Ready to ship a PiP that users actually love?
PiP is the cheapest meaningful feature you can add to a video-centric product in 2026. The classic API is universal; Document PiP opens up meeting, editor and collaboration use cases on Chromium; Android is straightforward; iOS is tricky only if you skip the review checklist. The highest-leverage move for most OTT, education, telehealth and meetings products is to ship Tier 1 PiP now and Tier 3 in the next sprint.
Fora Soft has been shipping PiP on web, iOS, Android and WebRTC for years. If you want a team that will land it in one focused sprint instead of three discovery ones, that is what we do.
Let’s scope your PiP work in one call
Bring your platforms, your players, your WebRTC stack (if any). We will come back with an honest estimate and a sprint plan.


.avif)

Comments