
Key takeaways
• Pure SwiftUI is rarely the right answer for video conferencing. The fast path to a shipped app in 2026 is a hybrid — SwiftUI for chrome and controls, UIKit for the video canvas and renderers.
• SwiftUI costs roughly 13% more memory and 5–10% more CPU than UIKit on comparable video grids, which matters once you cross ~12 concurrent tiles or target older devices.
• The video surface must be UIKit-backed — AVSampleBufferDisplayLayer, Metal views and WebRTC renderers are UIView subclasses, so you wrap them in UIViewRepresentable and guard against re-creation on every redraw.
• State-loop bugs kill more video apps than framerate does. Never store frame buffers in @State; use @Observable (iOS 17+) with scoped reads and batch updates with Combine.
• Fora Soft has shipped 625+ real-time video products. If you need a scoping call on SwiftUI vs UIKit for your app, we’ll share a decision matrix within 48 hours of your first 30-minute call.
Why Fora Soft wrote this playbook
We have been building real-time video products on iOS since the first WebRTC builds landed on the platform. Over 21 years we have shipped more than 625 video and streaming products, from 1:1 calling apps like Speakk to classroom platforms with thousands of concurrent tiles like BrainCert, and from social video networks like ChillChat to cloud meeting rooms like ProVideoMeeting. When SwiftUI dropped in 2019 we tried to move aggressively — and learned the hard way where it shines, where it breaks under real-time load, and where UIKit still earns its keep.
This playbook is what we would tell a founder, CTO, or engineering lead on a 30-minute call. It is not a generic “SwiftUI is the future” pitch; it is a pragmatic decision guide based on apps that actually survived production: real devices, real networks, real budgets, and real users who do not tolerate a frozen tile during a sales call or a healthcare consultation. We ground every recommendation in numbers, SDKs we use day to day, and mistakes we have made so you do not have to repeat them.
If you want a second opinion on your framework choice or a scoped estimate for your video app, our Agent Engineering workflow turns a 30-minute call into a numbered plan in days, not weeks. Book a slot below and bring your hardest edge case.
Stuck picking SwiftUI or UIKit for your video app?
Share your target device, participant count, and ship date. We’ll map them to a hybrid architecture and a realistic timeline.
The verdict — what actually ships video calls in 2026
The short answer is that every serious iOS video conferencing app we have seen in production in 2026 is either UIKit-only or a SwiftUI/UIKit hybrid. Pure SwiftUI calling apps exist, but they are prototypes, internal tools, or 1:1 demos — not platforms serving thousands of concurrent tiles at stable frame rates. Zoom, Microsoft Teams, and Google Meet still run their video surfaces on UIKit under the hood. Apple itself ships FaceTime on internal frameworks closer to UIKit than to SwiftUI.
The reason is architectural, not fashion: video rendering in iOS happens on AVSampleBufferDisplayLayer, Metal layers, or WebRTC’s RTCMTLVideoView — all of which are UIView subclasses. SwiftUI exposes them through UIViewRepresentable, but every such bridge is a seam that can re-create the renderer, drop frames, or fight with SwiftUI’s diffing engine. In a chat app that seam is invisible. In a 20-tile classroom it is the difference between a $0 refund and a churned enterprise customer.
That does not make SwiftUI a bad choice. It makes it the right choice for the 70% of the UI that is not the live video — chat, participant lists, settings, onboarding, CallKit-adjacent flows, pre-join screens. Our rule of thumb: SwiftUI outside the meeting, UIKit (or UIKit-wrapped) inside it. That rule keeps you fast and keeps your meetings smooth.
Reach for a SwiftUI/UIKit hybrid when: you need to ship fast, your meeting surfaces have 2–24 concurrent video tiles, and your team is comfortable writing at least one UIViewRepresentable wrapper around the video renderer.
Where SwiftUI fits and where it breaks
SwiftUI earns its reputation on everything that is not high-frequency pixel manipulation. Onboarding flows, login, dashboard screens, chat lists, participant panes, settings, device-pickers, captions, reactions, ratings at the end of a call — these are all places where SwiftUI’s declarative syntax cuts development time by 30–40% and makes previews an everyday tool. @Observable (iOS 17), SF Symbols, built-in dark mode, and Dynamic Type are productivity wins your team will feel in the first week.
Where SwiftUI breaks under video load is the high-frequency update path. If you store per-frame or per-participant bitmap data in @State you will trigger cascading diff passes and watch the main thread saturate. If you naively put an UIViewRepresentable inside a parent that re-renders frequently, your renderer gets torn down and rebuilt on every redraw — users see flicker, black frames, or audio-only fallbacks. We have debugged these bugs on shipped apps and they are never in the “first week” of a project; they surface three months in, after the participant count crosses 8 or 12.
The second failure mode is platform features that have not caught up. AVPictureInPictureController, custom orientation locking, CallKit custom UI, advanced gesture disambiguation between a pinch-to-zoom on a tile and a swipe-to-dismiss on the parent — these all still require a UIKit escape hatch. SwiftUI is getting closer every year, but for a production video app you will still touch UIKit for at least one of them.
Where SwiftUI earns its keep
1. The app chrome. Navigation stacks, tab bars, modal sheets, and toolbar buttons become one-liners. NavigationSplitView and NavigationStack handle iPad and Mac-Catalyst layouts without separate storyboards.
2. Chat and participant lists. Lists with ForEach over a participant model are clean, fast, and diff well. Drag-to-reorder, swipe actions, search, and pull-to-refresh are free.
3. Reactions, overlays, and annotations. Animated overlays on top of a video tile — emoji reactions, hand-raise badges, name pills — are where SwiftUI’s .animation and matchedGeometryEffect shine.
4. Previews and design reviews. With #Preview macros your designer can click through a call state machine without building and running the app. This alone saves a week per quarter on a team of five.
5. Accessibility. VoiceOver, Dynamic Type, and reduced motion work out of the box. UIKit needs extra code for each; SwiftUI gets them almost for free.
Where SwiftUI breaks under video load
1. State loops on the video path. Storing CVPixelBuffer or CMSampleBuffer in @State triggers a full-body re-render on every frame. Even a 15 fps tile can saturate the main thread. The fix is to keep frames out of SwiftUI entirely — the video layer renders itself, SwiftUI just owns the layout.
2. UIViewRepresentable re-creation. Each parent re-render can re-invoke makeUIView and tear down the underlying layer. You fight this with stable identity (.id(participant.id)), equatable models, and by putting the representable outside the parent’s state-reading path.
3. LazyVGrid with many tiles. For more than ~20 participants, a LazyVGrid lazily creates and destroys tiles as they scroll. With live video attached that means video stops and starts at scroll edges. A UICollectionView with prefetching handles this more gracefully.
4. Gesture conflicts. Pinch-to-zoom on a tile, drag-to-dismiss, tap-to-focus — SwiftUI gesture modifiers collide with UIKit gesture recognizers inside your renderer. Coordinators and simultaneousGesture solve most cases, but each is a debugging session.
5. Orientation control on a single screen. Locking one screen to landscape while the rest of the app is portrait still needs a UIHostingController subclass and overrides. Pure SwiftUI cannot do it cleanly in iOS 17/18.
Where UIKit still wins on video surfaces
UIKit is a decade more mature than SwiftUI, and nowhere is that more visible than inside a call. Every commercial video SDK — LiveKit, Agora, Twilio, Zoom, Daily, 100ms, Vonage — ships its renderer as a UIView. Every Apple framework in the video path — AVFoundation, VideoToolbox, Metal, AVKit, CallKit’s provider UI — is UIKit-first. When something goes wrong in a meeting, the Stack Overflow answers, WWDC samples, and engineering blogs you will search are all UIKit.
Performance is the second reason. Measured benchmarks published in 2025 show UIKit using roughly 13% less memory and 5–10% less CPU than SwiftUI on comparable participant grids. On a 1:1 call the difference is imperceptible. On a 12-person classroom on an iPhone 12, it is the difference between steady 30 fps and thermal throttling into 15 fps after 20 minutes. On older devices (iPhone X, iPad mini 5) the gap widens to 20% or more.
UIKit also gives you direct, predictable lifecycles. viewDidLoad, viewWillAppear, and viewDidDisappear give you exact hooks to start/stop the camera, attach the remote renderer, or release the MTKView. SwiftUI’s .onAppear and .task are close, but there are corner cases (tab switches, modal dismissals, backgrounding) where they fire unpredictably and leave you fighting duplicate tracks or abandoned sessions.
Reach for UIKit when: you are rendering live video at 24+ fps, your meeting needs to support 12+ concurrent tiles on an iPhone 13 or older, or you are integrating directly with AVCaptureSession, Metal, or WebRTC’s low-level video API.
The hybrid pattern most shipped apps actually use
Roughly 70% of professional iOS teams shipping video products in 2026 use some version of the same pattern: the app frame is SwiftUI, the call screen is a SwiftUI container, and the actual video tiles are UIKit renderers wrapped in UIViewRepresentable. State flows through an @Observable call model shared across both layers. The result combines SwiftUI’s development velocity on 70% of the surface with UIKit’s stability on the 30% that matters most.
The architectural rule we give every team is a simple three-layer split. Layer one is a pure Swift domain model of a call — room, local participant, remote participants, tracks, media state — with no UIKit or SwiftUI imports. Layer two is an @Observable facade that the UI reads. Layer three is the UI, where SwiftUI owns layout and UIKit owns the pixels. Keeping the domain model framework-free lets you unit-test it, reuse it on macOS via Catalyst, and swap the UI stack later if Apple changes the story.
The UIViewRepresentable that wraps your video renderer should be minimal. Create the view once, set the participant ID via updateUIView, never re-instantiate the Metal layer, and use EquatableView or explicit .id() to stop SwiftUI from tearing it down. Put the representable into a LazyVGrid only if your participant count is below ~20, and use UICollectionView for anything larger.
// Minimal, safe UIViewRepresentable for a WebRTC tile
struct VideoTile: UIViewRepresentable, Equatable {
let participantID: String
let renderer: RTCVideoRenderer // cached outside by a VM
func makeUIView(context: Context) -> RTCMTLVideoView {
let view = RTCMTLVideoView()
view.videoContentMode = .scaleAspectFill
return view
}
func updateUIView(_ view: RTCMTLVideoView, context: Context) {
// Attach track only once per participant; CallModel owns the cache
CallModel.shared.attach(participantID: participantID, to: view)
}
// Equatable stops SwiftUI from re-invoking the representable
static func == (lhs: VideoTile, rhs: VideoTile) -> Bool {
lhs.participantID == rhs.participantID
}
}
Performance benchmarks side by side
The numbers below aggregate published benchmarks and our internal measurements on 4-, 12-, and 24-tile participant grids running a WebRTC SFU on iPhone 12, 14 Pro, and 15 Pro. Absolute values depend on your tile resolution and codec, so treat these as relative deltas. The takeaway is consistent: SwiftUI is cheap enough for chrome and small meetings, noticeable at mid-scale, and a liability at large scale without UIKit escape hatches.
| Scenario | Metric | Pure UIKit | Hybrid | Pure SwiftUI |
|---|---|---|---|---|
| 4 tiles, iPhone 14 Pro | Sustained FPS | 30 | 30 | 29–30 |
| 4 tiles, iPhone 14 Pro | Peak memory | ~92 MB | ~105 MB | ~140 MB |
| 12 tiles, iPhone 12 | Sustained FPS | 28–30 | 26–30 | 18–24 (throttles) |
| 12 tiles, iPhone 12 | CPU at 20 min | 38% | 42% | 58% |
| 24 tiles, iPhone 15 Pro | Sustained FPS | 30 | 30 | 22–27 (jitter) |
| 24 tiles, iPhone 15 Pro | Peak memory | ~240 MB | ~260 MB | ~340 MB |
| Chat & chrome only | Dev velocity | Baseline | +30% | +35–40% |
Two patterns are worth calling out. First, the hybrid sits within measurement noise of pure UIKit on sustained FPS — the SwiftUI overhead lives mostly in startup and diff passes, not in the video hot path. Second, pure SwiftUI at scale is not slow because SwiftUI is slow; it is slow because the default patterns (state-held frames, re-created representables) stack against you. A disciplined pure-SwiftUI app with careful @Observable scoping can close most of the gap — but at that point you have already re-invented the UIKit patterns, so the hybrid is the honest answer.
For detailed iOS performance tuning beyond framework choice, read our companion guide on how to optimize iOS apps for speed and stability, which covers Instruments workflows, memory leaks, and launch-time tuning — all of which compound with framework choice.
Feature-by-feature comparison matrix
When the benchmark numbers settle, most of your decisions still come down to capability gaps — which framework gives you the feature you need today without a workaround. The matrix below is the cheat sheet we hand to new hires before their first video-app project.
| Capability | SwiftUI native | Needs UIKit bridge | UIKit effort (days) |
|---|---|---|---|
| Camera preview | No | AVCaptureVideoPreviewLayer | 0.5–1 |
| WebRTC remote tile | No | RTCMTLVideoView | 1–2 |
| Picture-in-Picture | No | AVPictureInPictureController + delegate | 3–5 |
| CallKit incoming UI | No | CXProvider + PushKit | 3–7 |
| Per-screen landscape lock | No | UIHostingController override | 1 |
| 20+ tile scrollable grid | LazyVGrid (flaky at scale) | UICollectionView recommended | 2–4 |
| Animated overlays on tile | Yes | — | 0 |
| Dynamic Type & VoiceOver | Yes (default) | — | 0 |
| Metal-based filters/LUTs | No | MTKView + CIContext | 3–8 |
The pattern is obvious when you scan the right column: almost every video-specific capability requires a UIKit bridge, and almost every non-video capability is free in SwiftUI. That is the structural case for the hybrid approach — not taste, not futureproofing, but the raw shape of the platform’s APIs in 2026.
Need a second opinion on your SwiftUI call screen?
We benchmark participant grids, trace state loops, and ship hybrid refactors. Most audits identify the main drop-frame cause inside one call.
State management for live participant grids
The single biggest performance and correctness decision in a SwiftUI video app is how you shape state. The wrong model feeds a feedback loop — one participant property changes, SwiftUI re-diffs the whole grid, every UIViewRepresentable re-invokes updateUIView, the video pauses for a beat, the user blames your app. Get the model right and the same grid holds 30 fps across a 60-minute session.
In iOS 17 and later we strongly recommend the @Observable macro over ObservableObject/@Published. @Observable tracks reads at the property level inside each view body, so a toolbar that only reads isMuted does not re-render when activeSpeakerID changes. For pre-iOS 17 targets, simulate this with multiple small ObservableObject classes scoped per concern.
Three rules that keep grids smooth
1. Never store frame data in SwiftUI state. The video renderer owns its pixel buffer. SwiftUI only owns the tile’s size, position, and participant identity.
2. Scope state to the smallest unit that changes. A ParticipantViewModel per tile that is itself @Observable beats a single CallModel publishing the whole roster on every event.
3. Batch high-frequency events. Active-speaker updates, audio-level meters, and network-quality badges fire many times per second. Debounce them with Combine (.throttle(for: .milliseconds(250))) or an AsyncSequence with a sampling operator before they hit the view model.
Wiring AVFoundation, Metal, and WebRTC into SwiftUI
Three rendering pipelines cover 95% of iOS video apps. Each is UIKit-native and each has a safe pattern for exposing it to SwiftUI.
AVCaptureVideoPreviewLayer (local camera preview). Wrap a UIView whose layer class is AVCaptureVideoPreviewLayer, inject the session via initializer, and set videoGravity = .resizeAspectFill. Never store the session in @State; keep it in a singleton or view model with explicit start/stop.
AVSampleBufferDisplayLayer (custom decoded frames). When you decode frames yourself — for example, when doing Metal-based filtering or HDR tone mapping — use AVSampleBufferDisplayLayer.enqueue(_:). Preferred API for WebRTC custom renderers too. Expose via UIViewRepresentable.
RTCMTLVideoView / LiveKit VideoView (SFU pipeline). All serious WebRTC SDKs in 2026 ship an MTKView-backed renderer. Wrap it, but cache attachments per participant ID so SwiftUI redraws do not detach and re-attach the track. This single caching optimization accounts for most of the “flickering tile” bug reports we have debugged.
AVPlayerLayer (VOD playback inside a call). Replays, watch-party mode, and webinars often need a prerecorded stream alongside live video. VideoPlayer (SwiftUI native) works, but for adaptive bitrate controls or custom overlays, fall back to AVPlayerLayer via representable.
A safe camera preview snippet
final class PreviewView: UIView {
override class var layerClass: AnyClass { AVCaptureVideoPreviewLayer.self }
var previewLayer: AVCaptureVideoPreviewLayer {
layer as! AVCaptureVideoPreviewLayer
}
}
struct CameraPreview: UIViewRepresentable {
let session: AVCaptureSession // owned by a view model, never recreated
func makeUIView(context: Context) -> PreviewView {
let v = PreviewView()
v.previewLayer.session = session
v.previewLayer.videoGravity = .resizeAspectFill
return v
}
func updateUIView(_ view: PreviewView, context: Context) {}
}
PiP, CallKit, and orientation — the tricky platform bits
Three platform features force UIKit into even a mostly-SwiftUI app. They are worth naming explicitly because every founder we meet assumes SwiftUI covers them in 2026. It does not.
Picture-in-Picture. AVPictureInPictureController requires a delegate and a source layer, both UIKit. Wrap your video surface in a coordinator and expose PiP controls to SwiftUI via a view model. Without PiP, users who switch to the messaging app or calendar during a call lose video, and retention craters for mobile meeting products.
CallKit. The incoming-call UI and lock-screen handling go through CXProvider and CXCallController — system UI, driven by a UIKit delegate. SwiftUI handles the post-answer screen, but the answer/decline event is still UIKit-land. Combined with PushKit for VoIP pushes, this is a UIKit subsystem that will live inside your app forever.
Per-screen orientation. Locking one screen (the call) to landscape while the rest of the app is portrait still requires a UIHostingController subclass that overrides supportedInterfaceOrientations. There is no first-class SwiftUI solution in iOS 18.
External camera and Continuity Camera. If your app supports Continuity Camera on iPadOS and macOS, or external USB cameras, AVCaptureDevice.DiscoverySession and the device-change notifications are UIKit/AppKit-shaped. You handle them in a service layer, not in SwiftUI.
A 20-tile participant grid — reference architecture
Here is the architecture we deploy when a client asks for a conferencing screen that needs to hold 20+ tiles without thermal throttling on iPhone 13-class hardware. It is the distilled result of roughly a dozen production shipments across education, telehealth, and enterprise meetings.
Container. A SwiftUI view owns the layout and exposes affordances — layout toggle (grid/speaker), reactions, mute, leave. It reads a single @Observable CallViewModel but only the properties it actually displays.
Grid. For fewer than 20 active tiles, a LazyVGrid with stable .id(participant.id) on each tile. For 20 or more, a UICollectionView inside a UIViewControllerRepresentable, using prefetching and diffable data sources. UIKit’s cell recycling is more predictable than SwiftUI’s lazy stack at scale.
Tile. Each tile is VideoTile(participantID:), an Equatable UIViewRepresentable. The underlying MTKView is pooled by the service layer and reused across participants — never allocated per frame.
Overlays. Name pill, mute badge, active-speaker ring, reactions — pure SwiftUI, sitting on top of the tile in a ZStack. They read separate small models scoped per participant.
Subscriptions. The call model subscribes to only the top N active tracks (usually N=9 in speaker view, N=20 in grid), telling the SFU which simulcast layers to deliver. This is the trick that lets a 200-participant classroom run at 30 fps: the user sees 20 tiles max, the SFU sends only those streams, and the device never decodes more than it displays.
Thermal and energy control. Hook ProcessInfo.thermalState into the call model. When thermal state hits .serious or .critical, downgrade non-speaker tiles to audio-only and reduce simulcast layer to 180p. Users would rather see fewer faces than a phone shutting down.
Mini case — what we learned shipping ChillChat and Speakk
ChillChat is a social video network where strangers meet in themed rooms of up to 8 participants. We built the iOS app with SwiftUI for all non-meeting surfaces — onboarding, rooms directory, profiles, moderation tools — and put the live tiles behind a UIKit renderer. With careful @Observable scoping and per-tile .id() we hit 30 fps on iPhone 11 and above, with roughly 4 weeks shaved off the schedule thanks to SwiftUI previews and declarative lists.
Speakk is a WhatsApp-style messenger with 1:1 and small-group video calling. Here the calling surface is simpler — one or two tiles — so we pushed SwiftUI further, keeping only the camera preview and remote renderer in UIKit. The trade-off: faster feature velocity on chat, but we still needed CallKit plus PushKit plus PiP, and those are all UIKit. Lesson: “mostly SwiftUI” still means you own three UIKit subsystems.
Across both products the pattern that mattered most was the single CallController service that owned AVFoundation, WebRTC tracks, and renderer lifecycle. The UI layer — SwiftUI or UIKit — only asked “give me a view for participant X” and never touched media directly. That one architectural boundary made both apps testable, let us swap the video SDK (Twilio to LiveKit on one of them) without rewriting the UI, and kept bug reports out of SwiftUI’s state graph.
If you want a similar architectural review for your app, a 30-minute scoping call usually surfaces the top two risks in your current stack, along with the smallest refactor that would stabilize your call screen.
Decision framework — pick your stack in five questions
Walk through these five questions in order. Each one either confirms your current instinct or nudges you toward a different split.
Q1. How many concurrent video tiles do you need on-screen? 1–2 tiles: SwiftUI-heavy is fine. 3–12 tiles: hybrid. 12+ tiles or scrollable grids: UIKit-heavy or a UICollectionView inside a representable.
Q2. What is your minimum supported device? iPhone 13 and later: any choice works. iPhone 11/12 or iPad mini 5: hybrid or UIKit. iPhone 8/X with iOS 16: UIKit is the safe default.
Q3. Do you need PiP, CallKit, or per-screen orientation? Any of them means UIKit enters your codebase. Plan for a UIKit hosting layer on day one rather than retrofitting.
Q4. What is your time to first launch? 8 weeks or less: hybrid with SwiftUI-heavy chrome wins on velocity. 16+ weeks: pick the split that minimizes long-term maintenance, often UIKit-heavy on the meeting screen.
Q5. How experienced is your iOS team with Metal, AVFoundation, and WebRTC? If the answer is “low,” stay with an SDK (LiveKit, Daily, 100ms) that gives you a drop-in renderer, and use SwiftUI only for the chrome. If you are rolling your own video pipeline, the UIKit side gets bigger fast.
Pitfalls to avoid
1. Storing frames in @State. A single @State var currentFrame: CVPixelBuffer? is enough to collapse performance on a 4-tile grid. Keep frames outside SwiftUI — always.
2. Re-creating UIViewRepresentable on each redraw. If your representable is defined inline inside a view that re-renders, you get a new makeUIView call each time. Lift the struct out, make it Equatable, and attach .id(participant.id).
3. LazyVGrid inside ScrollView without stable IDs. Scroll the grid, tiles get destroyed and rebuilt, video restarts. Always attach stable identities and prefetch the next row of tracks before the user scrolls.
4. Shipping without Instruments profiles. If you do not have a “before/after” time-profiler trace of a 10-minute call, you will ship a performance regression next release. WWDC25’s “Optimize SwiftUI Performance with Instruments” session is worth 60 minutes of your time.
5. Ignoring thermal and energy state. A call that melts the phone within 10 minutes fails enterprise pilots and kills App Store ratings. Listen to ProcessInfo.thermalState and isLowPowerModeEnabled; downgrade gracefully.
KPIs to measure after launch
1. Quality KPIs. Median received video FPS per tile (target: ≥ 24 on iPhone 11 and later), audio MOS score (target: ≥ 4.0), freeze count per 30-minute call (target: < 2), and time-to-first-frame after join (target: < 1.5 seconds).
2. Business KPIs. Call completion rate (target: > 95%), pre-join drop-off (target: < 10%), 1-week retention for users who held at least one call (target: > 40%), and average call duration. These are the numbers your product team will care about; engineering choices eventually show up in them.
3. Reliability KPIs. Crash-free users (target: > 99.5%), main-thread hang rate (target: < 0.5% of sessions), and p95 memory usage on a 30-minute 8-tile call (target: < 300 MB on iPhone 12). Ship these dashboards the same week you ship the first meeting screen.
When to NOT pick SwiftUI
Three cases where we push clients toward UIKit-only even in 2026. First, if you must support iOS 12 or earlier — still common in enterprise deployments with MDM-locked devices — SwiftUI is not available. Second, if your app is a niche broadcasting tool with heavy Metal pipelines (AR filters, virtual backgrounds, overlay compositing, LUTs), the entire rendering stack is so UIKit-native that SwiftUI buys you almost nothing and costs you a seam to debug.
Third, if your team has ten years of UIKit experience and no SwiftUI experience, and you are on a 12-week timeline to a critical launch, the learning curve is not worth the risk. SwiftUI is forgiving for list-driven apps and brutal for video apps on a deadline. We have watched experienced UIKit teams lose three weeks to a single state-loop bug. Start hybrid after launch, not during.
Cost and timeline implications
Framework choice changes your timeline more than most founders expect. On the chrome (70% of screens that are not the meeting) SwiftUI saves roughly 30–40% of development time versus UIKit. On the meeting screen it usually costs you back some of that as you fight state and representable issues. The net on a greenfield iOS calling app is a 15–20% velocity gain if the hybrid is disciplined, and a wash or regression if the team naively tries pure SwiftUI for the meeting.
Maintenance costs diverge later. UIKit-only apps are easier to debug when something breaks in a call, because every forum answer is UIKit-shaped. SwiftUI apps are cheaper to evolve for product features like new chat layouts, reactions, and onboarding tweaks. The hybrid approach is not the cheapest in any single dimension, but it is the lowest-risk choice on a 24-month product horizon.
Because we combine Agent Engineering with our iOS team, Fora Soft estimates tend to land faster and cheaper than traditional vendors on the chrome, and roughly in line on the meeting engine — that is the part where physics still dominates. If you want a realistic range for your scope, share your target participants, minimum device, and ship date with us on a call and we will send a numbered estimate within 48 hours.
Want a numbered estimate for your video app?
Tell us your device target, participant count, and ship date. We’ll send back a scoped plan within 48 hours of the call.
FAQ
Can SwiftUI alone handle enterprise video conferencing with 20+ participants?
Technically yes, pragmatically no. At 20+ tiles SwiftUI’s diffing and LazyVGrid recycling fight the video pipeline, and shipped enterprise apps put the grid behind a UICollectionView for predictability. A UIKit canvas inside a SwiftUI shell is the standard pattern.
Is SwiftUI compatible with older iOS versions for video conferencing?
SwiftUI ships on iOS 13+, but the features that matter for video apps (@Observable, modern navigation, improved layout performance) require iOS 17 or later. For an app that must run on iOS 12 or earlier — still common in enterprise fleets — UIKit is the only option.
How does SwiftUI compare to UIKit in battery consumption during calls?
Our measurements show SwiftUI consumes 8–15% more battery than UIKit during 30-minute calls with 4+ tiles, driven by the extra diff passes and view re-renders. On a 1:1 call the difference is negligible. On a 12-person call it can translate into 10–15 fewer minutes of call time per charge, which matters for enterprise road warriors.
Do I need to write my own renderer or can I use an SDK?
Unless your product is a specialised broadcasting tool, use an SDK. LiveKit, Daily, 100ms, Twilio, Agora, and Vonage all ship optimised UIKit renderers. You wrap them in UIViewRepresentable and focus your engineering on the product, not the video plumbing. We cover SDK trade-offs in depth in our Agora.io alternative guide.
What is the learning curve for UIKit developers switching to SwiftUI?
Two to four weeks for basic fluency on lists and forms. Eight to twelve weeks before a developer intuits when to use @State versus @Observable versus Binding on performance-sensitive surfaces. The bridging layer (UIViewRepresentable, coordinators) adds another four to six weeks. Budget a full quarter before a team ships production-grade video in SwiftUI.
Are there pre-built UI components for video conferencing in SwiftUI?
Yes — Stream, LiveKit, and 100ms all ship SwiftUI-friendly component libraries for tiles, controls, and participant lists. They are a good starting point for an MVP. For a branded product you will still customise them heavily, but the default UI saves roughly two to three weeks of boilerplate.
How do I handle Picture-in-Picture with SwiftUI?
Wrap AVPictureInPictureController in a coordinator attached to the same UIViewRepresentable as your video renderer. Control PiP state from SwiftUI via an @Observable model. There is no native SwiftUI PiP API as of iOS 18, and attempting to shoehorn it without UIKit is the source of most “PiP does not start” bug reports.
Should I migrate an existing UIKit video app to SwiftUI?
Rarely as a full rewrite — almost always as a gradual migration of chrome and settings while leaving the meeting screen untouched. A full rewrite of a mature UIKit calling app takes 6–12 months and usually ships with a performance regression. The hybrid migration preserves the stable parts and gets velocity gains on the parts that need them.
What to Read Next
iOS Architecture
The 2026 iOS MVVM-C Playbook
SwiftUI @Observable, coordinators, DI and Swift 6 concurrency — the architecture layer behind every hybrid video app.
Swift 6
Swift 6 iOS Development — Next-Gen Video Chat
Strict concurrency, data-race safety, and how the new Swift model fits video chat apps built on SwiftUI or UIKit.
WebRTC Stack
Agora.io Alternative in 2026
Custom WebRTC with LiveKit, mediasoup, Jitsi and Janus — what to pair with your SwiftUI/UIKit front-end.
Performance
How to Optimize iOS Apps for Speed and Stability
Instruments workflows, launch time, memory leaks — the KPIs that compound with framework choice on video apps.
Dependency Management
Swift Package Manager for Video Apps
How to keep LiveKit, WebRTC, Starscream and analytics SDKs reproducible and binary-cached — a 2025 developer’s guide.
Ready to ship a video app that actually scales?
The SwiftUI vs UIKit question is not ideological, it is architectural. Ship the chrome in SwiftUI, ship the meeting canvas in UIKit behind a thin representable, and let a disciplined @Observable model glue them together. That split gives you 30–40% velocity on most of the app while keeping the part your users judge — the live video — smooth, predictable, and portable across SDKs.
At Fora Soft we have used this pattern across 625+ shipped products, from social video like ChillChat to enterprise learning like BrainCert. If you want a scoped plan for your app — participant count, minimum device, ship date, honest trade-offs — bring your hardest edge case to a 30-minute call. Agent Engineering turns that call into a numbered estimate within 48 hours.
Bring your hardest video-app question to a 30-min call
Share your device target, participant count, and ship date. We’ll send back a hybrid architecture sketch and a scoped estimate within 48 hours.


.avif)

Comments