Custom intercom software architecture with video streaming, authentication, and visitor management

Key takeaways

Modern intercom software is a stack, not a feature list. Video, audio, SIP, WebRTC, edge AI, smart-lock integration, MDU support and audit trail must all line up. Skipping any single layer creates a building-lockout class of bugs at 2 a.m.

The market is growing 16% a year. Global smart video doorbell + IP intercom is around $5.8B in 2024 with a 16.2% CAGR through 2030. Multi-family residential and hospitality are pulling demand; commercial drives margin.

Edge AI changed the design center. Hailo-8 and Jetson-class accelerators now run face / package / animal / LPR detection on the doorstep, which fixes the GDPR Article 9 problem and the “cloud is offline” problem in one move.

SIP + ONVIF + Matter is the openness test. If your platform forces a single-brand hardware lock, you bought 3-year vendor lock-in. RFC 3261 SIP, ONVIF Profile T, and Matter / Thread on the lock side keep the door — literally — open.

Compliance is the architecture. GDPR Articles 6 / 9, CCPA, EN 50486 lift safety, IEC 60849 emergency comms, NDAA Section 889 for US federal sites, HIPAA for healthcare. Decide your compliance footprint before you decide on hardware.

Why Fora Soft wrote this playbook

Fora Soft has been shipping multimedia and IP-camera software since 2005. Of our 625+ delivered products, intercom-adjacent work spans V.A.L.T. (video surveillance running across 770+ organizations), Netcam Studio (IP camera management with PTZ control, ONVIF, motion detection), DSI Drones (AI-driven aerial threat detection), and ProVideoMeeting (WebRTC + FreeSWITCH SIP video conferencing).

Across those projects we have learned what separates an intercom feature list from an intercom product: the SIP / WebRTC plumbing, the multi-tenant data model, the smart-lock and PMS integrations, the audit trail, and the way the system behaves when the internet drops at 3 a.m. The gap between a doorbell demo and a 100-unit MDU rollout is wide and unforgiving.

This guide is the playbook we hand to founders and product owners specifying or buying intercom software in 2026. It is opinionated, prioritized by what holds up after the contract is signed, and grounded in the trade-offs we use when scoping a build.

Specifying an intercom platform and not sure which features are table-stakes?

Tell us your unit count, region, and integration list — we’ll come back with a 2-page feature scorecard and a realistic estimate.

Book a 30-min call → WhatsApp → Email us →

The 2026 intercom market: numbers worth scoping against

The smart video doorbell and IP intercom market reached roughly $5.8B in 2024 with a 16.2% CAGR projected through 2030. North America and Europe account for around 55% of revenue; Asia-Pacific is the fastest-growing region as multi-dwelling-unit (MDU) density compounds. Residential video doorbells make up about 48% of revenue; commercial / multi-tenant IP intercom makes up the rest with higher margins.

Three drivers reshape feature priorities in 2026: 5G-class connectivity making cellular fallback affordable, edge AI silicon (Hailo-8, Jetson Orin) making local face / package / LPR inference practical, and the EU AI Act + GDPR Article 9 making cloud-only biometrics legally fragile.

Reach for serious software when: your roadmap crosses 100 units, multi-tenant routing, smart locks, and any biometric AI — below that, off-the-shelf consumer doorbells (Ring, Nest) are usually the right call.

A 60-second decision matrix

Match your deployment to the right kind of intercom platform before you start scoring features. The matrix below saves the typical 2–3 weeks of vendor demos.

Deployment Reach for Why
Single-family residential Consumer doorbell (Ring, Nest, DoorBird) Hardware $100–$400, sub $10/month, sub-day install.
MDU residential 50–500 units Custom platform on 2N / Akuvox + PMS SIP / ONVIF, per-unit routing, bulk provisioning, PMS integration.
Hospitality Akuvox / 2N + PMS API Mobile key, smart-lock integration, audit trail by stay.
Healthcare 2N / custom + HIPAA BAA-bound cloud, encrypted recording, role-based access.
US federal / sensitive sites NDAA-compliant hardware (Aiphone, 2N) No Hikvision / Dahua / ZTE / Huawei components allowed.
Commercial high-security Custom platform + biometrics Face + fingerprint, vandal-proof, tamper detect, audit trail.

Feature 1: Video — resolution, HDR, low-light, field of view

Video is where most demos win and most installations fail. The boundary conditions (backlit afternoon porch, midnight package thief, fog) are what separate a usable system from one that produces unidentifiable footage.

1. Resolution. 1080p at 15–30 FPS is the cost-conscious baseline; 4K (3840×2160) is now standard for premium and commercial. Resolution above 1080p only earns its keep with HDR and a lens that resolves to it.

2. HDR. Non-negotiable for any door facing direct sun or against a backlit interior. Without HDR, faces wash out and license plates blow to white.

3. Low-light / starlight. 0.1 lux minimum is the spec point. IR cut-filter switching and dual-sensor color-at-night designs are now common in the $400+ range.

4. Wide-angle lens. 160–180° horizontal FoV mimics human peripheral vision and avoids the “crouched person below the camera” blind spot. Beware fish-eye distortion eating recognition accuracy at edges.

Feature 2: Audio — codecs, AEC, noise suppression

Two-way audio is what makes an intercom an intercom. Get this wrong and your beautiful 4K video product still feels broken.

1. Codecs. Opus is the modern default for low-bandwidth two-way (8–128 kbps adaptive). G.711 for legacy SIP interop, G.722 for wideband over IP. Avoid platforms that only ship G.711.

2. Acoustic Echo Cancellation. WebRTC native, Krisp, or Dolby Voice. Without AEC, every call doubles back as a feedback loop — the most common reason consumer doorbells get returned.

3. Noise suppression. Wind, HVAC, traffic, lobby chatter. Modern ML-based denoisers buy 6–15 dB SNR improvement; non-trivial for outdoor units.

Reach for ML-based denoising when: the unit is outdoor, near a busy street, or in a multi-residential lobby — standard DSP-only AEC fails on real wind and crowd noise.

Feature 3: SIP, ONVIF, and WebRTC — the openness layer

Three protocols separate a real platform from a vendor-locked toy.

1. SIP (RFC 3261). Lets the intercom register with any compliant PBX (Asterisk, FreeSWITCH, Kamailio, hosted carriers like Twilio / Vonage). Without SIP, you are tied to the vendor’s app and cloud.

2. ONVIF Profile T. The standard for IP video device management. Profile T covers H.264 / H.265 streaming, motion alarms, and PTZ. If a vendor cannot publish its ONVIF conformance certificate, treat that as a signal.

3. WebRTC. The right way to answer a doorbell call from a mobile phone or web client — sub-200 ms setup, native browser support, no native app required. Read our 12 must-have video intercom features for the protocol-by-protocol breakdown.

Feature 4: Edge AI — face, package, animal, license plate

In 2026, AI detection runs on the doorstep, not in the cloud. Edge silicon (Hailo-8, NVIDIA Jetson Nano / Orin, Google Coral) is now cheap enough that any commercial-grade unit can ship with it.

1. Face detection vs face recognition. Detection (“there is a face”) is fine under GDPR Article 6. Recognition (“this is John”) is biometric processing under Article 9 — needs explicit consent or a narrow exemption. Architect the difference into the data model.

2. Package detection. Mail / parcel / box shape classification. Now standard at the $300+ price point; reduces porch-piracy claims and triggers smart notifications.

3. Animal classification. Dog / cat / wildlife — suppresses 60–80% of false alarms in suburban deployments.

4. License plate recognition. 85–92% accuracy in good light; visitor parking analytics, watchlist matching. Worth the build for any commercial site with a parking surface.

Reach for edge AI when: your privacy posture or your latency budget cannot tolerate cloud-only inference, or when the building’s WAN is unreliable — both are now the norm rather than the exception.

Feature 5: Visitor management — QR, PIN, mobile keys

Visitor management is what turns an intercom from a doorbell into a building product.

1. One-time PINs and QR codes. Time-bound entry codes texted or emailed to a guest, contractor, or delivery driver. Should expire by time and use count. Audit log mandatory.

2. Mobile keys (BLE / NFC). The convenience pillar for hospitality and luxury MDU. Integrate with smart locks via Matter / Thread, Z-Wave (S2), or Zigbee 3.0.

3. PMS / Property Management System integration. AppFolio, Yardi, Marriott, Hilton APIs. The intercom becomes part of check-in for hospitality and lease workflow for residential.

Feature 6: Multi-tenant (MDU) support — the make-or-break

Most consumer platforms cannot do MDU. This is the single feature that disqualifies the largest number of products on a residential RFP.

1. Per-unit call routing. Apartment 101 rings 101’s residents, not the building reception. Sounds obvious; many systems still hub-and-spoke through a single account.

2. Shared-area policies. Lobby and garage cameras follow rule-based dispatch — office hours, after hours, holidays, emergencies.

3. Sub-account permissions. Unit owner, tenant, building staff, guest — each with a distinct access scope. Role-based, not email-list-based.

4. Bulk provisioning. Hundreds of units onboarded via CSV or API in hours, not weeks. The biggest cost saver in MDU rollouts.

Reach for MDU-grade software when: you cross 20+ units, have building staff and tenants with different scopes, or expect tenant turnover — consumer-grade software bleeds time on every move-in.

Feature 7: Smart-lock integration — Matter, Z-Wave, Zigbee

The intercom calls the door — literally. Lock integration is the bridge that turns a screen experience into a real workflow.

1. Matter / Thread. The 2026 default for new builds. IP-based, vendor-neutral, supported by Apple Home, Google Home, Amazon Alexa, Samsung SmartThings. First Matter-native locks shipped through 2024–2025; expect feature parity with Z-Wave by mid-2026.

2. Z-Wave (S2 encryption). Mature, dense ecosystem (Yale, Schlage, August). Still the safest choice for current-year EU residential and US installs.

3. Zigbee 3.0. Wide vendor support, mesh networking. Common in SmartThings deployments. Migrating to Thread under the Matter umbrella.

4. Direct strike control. SIP INVITE with a specific header (e.g. `Action: buzz`) or a relay output on the intercom unit. Required for commercial.

Feature 8: Cloud + edge dual mode and WAN fallback

A door must work when the internet does not. Two patterns matter:

1. Edge-first inference. Run face detection, package detection, AEC, and basic call setup locally. The cloud is for storage, analytics, and remote answering — not for the door opening.

2. PoE + cellular fallback. 802.3at PoE+ as the primary power and connectivity, with a 4G / 5G modem that activates on WAN loss. Cost adds ~$80–$150 per unit; saves a building lockout incident.

For the broader cloud architecture pattern, see our secure cloud video management guide.

Feature 9: Voice AI — LLM-powered receptionist

The AI receptionist is where the 2025–2026 product wave actually lands. GPT-4-class and Gemini-class LLMs are now production-ready for call screening and routing.

1. Smart greeting. “Building open 9 a.m.–5 p.m. Press 1 for reception, 2 for maintenance, or describe who you are looking for.” Cuts misroutes by 40–60%.

2. Context-aware routing. A delivery driver asking for “101” can ring 101 directly without going through reception. Pulls from the PMS and the building schedule.

3. Voicemail transcription and summary. Deepgram, AssemblyAI, OpenAI Whisper. Searchable archive replaces a no-one-listens-to voicemail box.

Already have intercom hardware and need the software layer to make it real?

We build SIP / WebRTC / edge-AI software on top of 2N, Akuvox, DoorBird, Aiphone, and ONVIF-compliant generics. Bring your hardware and your unit count and we’ll diagnose where the gaps are.

Book a 30-min call → WhatsApp → Email us →

Reference architecture for a 2026 intercom platform

A serious intercom platform has eight layers. The pattern is the same whether you build on top of 2N, Akuvox, or roll fully custom; only the specific components shift.

1. Hardware. ONVIF Profile T outdoor unit with PoE+, 4K + HDR camera, omnidirectional mic + noise-suppressing mic array, IP65 / IK10 rating. Edge AI accelerator (Hailo-8 / Jetson) on the unit.

2. SIP server. Asterisk, FreeSWITCH, or Kamailio, hosted in your VPC. Handles call setup, registration, voicemail.

3. Media server. Janus or mediasoup as the SFU for WebRTC client legs.

4. AI inference. On-device for face / package / LPR; cloud (TensorFlow Serving, Triton) for heavier video analytics over recordings.

5. Storage. Encrypted recording (AES-256 at rest), 48–90 day retention by default, immutable audit log of unlocks and call history.

6. Smart-lock layer. Matter / Thread or Z-Wave / Zigbee bridge; or direct strike relay for commercial.

7. Mobile and web clients. APNs / FCM push, native iOS / Android, web answer console for staff. WebRTC for sub-200 ms call setup.

8. Integrations. PMS (AppFolio, Yardi, Marriott, Hilton APIs), CRM (HubSpot / Salesforce), SSO (Okta, Azure AD). For Android-side intercom apps see our Android smart intercom guide.

Vendor comparison matrix (2026)

Indicative side-by-side of the major intercom hardware platforms most software products integrate with. Verify pricing on each vendor’s site before you commit; ranges below are typical street prices.

Vendor SIP / ONVIF AI on edge Hardware $ Cloud $/door/mo Best for
2N Yes / Yes Optional $350–$650 $8–$15 Enterprise / MDU
DoorBird Yes / Yes Limited $180–$400 $5–$10 Premium consumer
Akuvox Yes / Yes Yes (face / package) $120–$350 $6–$12 Hospitality / MDU
Aiphone Yes (newer SKUs) Limited $250–$700 Varies US federal (NDAA-safe)
Hikvision / Dahua Yes / Yes Yes $80–$250 Bundled Asia residential (NOT US fed)
Ring / Nest / Eufy No / No Yes $100–$300 $4–$10 Single-family residential

Cost model: a 100-unit MDU rollout in 2026

A 100-unit residential building is the line where custom intercom software pays back. Below is a back-of-envelope using 2026 list prices.

1. Hardware. One outdoor entry station + relay box + cellular fallback + 100 in-unit answer screens (or app-only). Mid-market spec lands at ~$35K–$50K capex hardware for the building.

2. Cloud subscription. $8–$15 per door per month at MDU pricing — about $200–$250/month for a 100-unit building when you include staff seats and storage.

3. Custom software layer. A platform that ties into the PMS, mobile keys, smart locks, and the building’s existing security stack typically lands in 12–20 weeks of build time. We do not put a number on it without seeing the integration list — ranges in this guide would be wrong by ~40% in either direction. Agent Engineering compresses our loop, so we usually come in below the legacy benchmarks.

4. Payback. Building entry-time savings, package-theft reduction, and on-site staff hours typically deliver an 18–24 month ROI on the hardware + first-year cloud spend.

Mini case: rebuilding the intercom layer on top of Netcam Studio

Our work on Netcam Studio — the modernized successor to one of the pioneering IP camera apps — ran into the classic intercom problem at scale: a desktop-grade UI, dozens of camera vendors, ONVIF compliance issues, and a long tail of customer expectations around motion detection and PTZ.

We rebuilt the web interface, normalized the ONVIF discovery flow across vendors, and added a multi-stream dashboard that survives a 32-camera deployment. The same code path now powers customer installations across ~190 countries. The architectural lessons applied directly to MDU intercom rollouts: ONVIF conformance, per-tenant routing, and a UI that staff can use after a 30-minute training instead of a week.

Want a similar audit of your current intercom stack?

A decision framework — pick your platform in five questions

Q1. How many doors / units? Under 5 — consumer doorbell. 5–50 — commercial-grade hardware + light SaaS. 50–500 — MDU platform with PMS integration. 500+ — custom software on commercial hardware.

Q2. Are you in scope for NDAA / federal compliance? Yes — Aiphone, 2N, certain Akuvox SKUs. No — full vendor universe.

Q3. Do you need biometric face recognition? Yes — insist on edge inference and a documented GDPR Article 9 lawful basis. No — pick face detection only.

Q4. Is the building’s WAN reliable? Yes — cloud-first is fine. No — demand cellular fallback and edge-first inference.

Q5. Do you need PMS / CRM / smart-lock integration? Yes — SIP + ONVIF + Matter is the openness gate. No — consumer-grade closed platforms can work.

Five pitfalls that kill intercom projects

1. Single-vendor lock. Proprietary protocol, proprietary cloud, proprietary app. Fine for a 5-door deployment; lethal for a 500-unit one. Insist on SIP + ONVIF interoperability.

2. No WAN fallback. Building loses internet, residents cannot enter. Every multi-tenant deployment must specify cellular backup or local-only call routing as a fallback.

3. Weak encryption. TLS 1.3 minimum on signaling; SRTP on media; AES-256 on storage. Anything older is a 2026 audit failure.

4. No audit trail. Every unlock, every call, every override must be logged immutably. Compliance regulators will ask; insurance carriers will ask after the first incident.

5. NDAA blind spot. Hikvision, Dahua, ZTE, Huawei components disqualify a system from US federal sites and many state-level deployments. Verify supply chain before commit.

KPIs to track from the first sprint

Quality KPIs. Call setup p95 (target < 2 seconds), false-alarm rate (target < 1 per door per day), face / package detection precision (target ≥ 0.9), audio clarity MOS (target ≥ 4.0).

Business KPIs. Average answer time (target < 15 seconds), package theft incidents (target >30% reduction), staff hours per resident move-in / move-out, monthly active users on the resident app.

Reliability KPIs. Door uptime (target ≥ 99.95%), WAN-fallback activations per month, cellular failover success rate (target ≥ 99%), tamper-event response time.

Compliance: GDPR, EU AI Act, NDAA, HIPAA, EN 50486

1. GDPR Article 6 / 9. Article 6 covers the lawful basis (legitimate interest is the typical one for building access). Article 9 prohibits biometric processing for identification without consent or a narrow exemption. Face recognition almost always falls under Article 9.

2. EU AI Act. Real-time biometric ID in publicly accessible spaces by law enforcement is largely prohibited. Post-hoc biometric analysis is high-risk and demands a documented impact assessment.

3. CCPA. California residents must see a privacy notice; right-to-delete applies. Visitor footage is personal information.

4. NDAA Section 889. US federal contracts forbid covered Chinese-origin components. Vet your hardware bill of materials.

5. HIPAA. Healthcare facilities need Business Associate Agreements for any cloud processor. Encryption at rest is non-negotiable.

6. EN 50486 / IEC 60849. European lift intercoms and emergency voice-alarm systems. Required for any building with a regulated lift or fire panel.

When NOT to build a custom intercom platform

Three situations where buying off-the-shelf is the right answer.

1. Single-family or small-business doors. Ring, Nest, DoorBird, Eufy ship in days for under $10/month. Custom is overkill.

2. Highly-standardized hospitality. Marriott, Hilton, IHG already have intercom-and-mobile-key partner ecosystems. Plug into the existing PMS rather than building a new platform.

3. No team for ongoing ops. Intercom platforms drift. Without a dedicated MLOps / DevOps cycle, even a beautiful first-month launch degrades in six months. If you cannot commit the team, partner with someone who can.

Ready to ship a smart intercom platform that holds up in production?

21 years of multimedia delivery, 625+ shipped products, deep work in V.A.L.T., Netcam Studio, and DSI Drones. Tell us your unit count and integration list and we’ll bring an architecture diagram to the first call. We also lead the surrounding AI integration work — edge inference, LLM receptionists, and PMS connectors.

Book a 30-min call → WhatsApp → Email us →

FAQ

Should I pick 4K or stay on 1080p?

Pick 4K only if your hardware also ships HDR and a lens that resolves to it. Without HDR, 1080p is more useful for face and license-plate identification. The pixel count without HDR is a marketing number.

Is SIP still relevant if I am building a fully cloud / WebRTC-only platform?

Yes, even more so. SIP is what lets the intercom interoperate with the building’s PBX, with hosted carriers like Twilio, and with third-party integrations. WebRTC is great for the client leg; SIP is great for the trunk leg. Most production stacks ship both.

Should I run AI inference on the device or in the cloud?

On-device for face / package / animal / LPR — latency, privacy, and offline-resilience all favor edge. Cloud is the right place for heavier video analytics over recordings, model retraining, and aggregated reporting. The hybrid pattern is now the default.

Can we still use Hikvision or Dahua intercom hardware?

For US federal contracts, no — NDAA Section 889 prohibits covered components. For commercial use outside the US, most regulators allow it, but state-level rules are tightening. We typically recommend NDAA-safe hardware (Aiphone, 2N, certain Akuvox SKUs) to keep the future expansion path open.

How does Matter / Thread change the architecture?

Matter is IP-based, vendor-neutral, and supported by Apple Home, Google Home, Alexa, and SmartThings. For new MDU builds in 2026, Matter is the future-proof choice on the smart-lock side. Z-Wave and Zigbee are still safer for current-year deployments where the device ecosystem is mature.

What is the cheapest viable intercom stack for an MVP?

A DoorBird or Akuvox unit + Twilio SIP + a thin custom mobile app. Hardware under $400/unit, hosted SIP under $50/month, plus the custom app. Move to a self-hosted media server only when concurrent answer volume justifies it.

How do I evaluate an intercom vendor’s past work?

Two requests: an ONVIF conformance certificate from a real customer deployment, and a 30-minute architecture walkthrough. Vendors who can show interoperability proof and trace the call setup live are doing the work; the others are reselling delivery teams.

Do AI receptionists actually save time at the front desk?

In commercial and hospitality deployments, an LLM-backed greeting + routing layer reduces misroutes by 40–60% and frees staff for higher-value tasks. The savings are smaller in pure residential. Pilot with a single building and measure before rolling out fleet-wide.

Features

12 Must-Have Video Intercom Features in 2026

A deeper dive into the protocol-by-protocol feature set with security focus.

AI & IoT

Smart Intercom Systems 2026: AI, IoT, Software Architecture

Where AI + IoT push intercom design over the next 24 months.

Android

Ultimate Guide: Android Smart Intercom Systems

Building the Android side of a smart intercom platform end to end.

Analytics

Data Analytics in Smart Intercoms

KPIs, architecture, and compliance for the analytics layer behind the door.

VMS

12 Essential Features of Modern VMS Software

The VMS layer that the intercom plugs into for recording and analytics.

Ready to ship intercom software customers actually trust?

Pick by deployment shape, not by feature poster. Single-family residential lives on Ring or Nest. MDU and hospitality demand SIP + ONVIF + Matter, edge AI, multi-tenant routing, PMS integration, and a real audit trail. Healthcare and US federal demand HIPAA and NDAA on top. The platform is always a stack — never a single product.

Plan for cellular WAN fallback, edge inference, and a compliance posture that survives GDPR Article 9, CCPA, NDAA, and the EU AI Act. Optimize for door uptime and resident-side answer time, not for cloud feature checklists. If you want a partner that has shipped exactly this combination across 770+ surveillance organizations and has Agent Engineering compressing the build loop, that is exactly what we built Fora Soft to do.

Want a 2-page architecture brief and a real estimate?

Tell us your unit count, deployment region, integration list, and compliance footprint. We’ll come back within two business days with a recommended hardware shortlist, a software architecture, and a realistic price range.

Book a 30-min call → WhatsApp → Email us →

  • Technologies