Online market research marketplace used by Google, McDonald's, Netflix, and Samsung

Key takeaways

Vocal Views proves the model. Fora Soft built a research marketplace adopted by Google, McDonald’s, Netflix, and Samsung — video interviews, automatic transcription in 30+ languages, live human interpreters, and a panel of incentivised respondents in one product.

The market is large and consolidating. Global market research is on track for $140B in 2024; UserTesting alone sold for $1.3B; the GenAI insights tier is rewriting the buyer’s spec sheet.

Three engineering pillars decide the build. WebRTC SFU with observer + interpreter modes; multilingual ASR (Deepgram, Whisper, AWS) with speaker diarization; AI synthesis (clips, sentiment, themes) tied to a panel-payment marketplace.

Live interpretation is a deliberate choice. Three audio streams, 2–3 second translation lag, and a UI that survives multilingual chaos — or skip it and rely on post-session translation.

Build vs. buy is about panels and AI moat. If the differentiator is a niche panel or a specific analysis workflow, build. If you need broad recruitment fast, integrate UserTesting / Respondent.io APIs and build only the differentiating layer.

Why Fora Soft wrote this playbook

We did not write this guide as a marketing brief. We wrote it because we shipped Vocal Views — a marketplace for online market research that today serves Google, McDonald’s, Netflix, and Samsung. Every architectural call below — SFU choice, transcription stack, observer mode, interpreter UX, panel payments — was made under pressure of paying enterprise customers running real interviews on the platform.

Vocal Views is not our only video-marketplace product. Our case studies include BrainCert — a virtual classroom and LMS with 100K+ paying customers and four Brandon Hall awards — and ProvideoMeeting, an enterprise video conferencing platform. Roughly 40% of our active engineering capacity sits in video, real-time, and AI — the disciplines a serious research-platform build needs from day one.

If you are a founder, product owner, or research-ops lead scoping a custom platform — or weighing UserTesting, dscout, Respondent, Lookback, or Discuss.io against a build — this article walks the same trade-offs we walked with Vocal Views. We use Agent Engineering — an AI-assisted internal delivery process — to compress the calendar and the cost on every project, which is why our quotes typically beat a US agency for the same scope.

Scoping a research-marketplace build?

Get a 30-minute architecture review with the team behind Vocal Views — SFU choice, ASR stack, panel payments, AI synthesis, realistic budget.

Book a 30-min call → WhatsApp → Email us →

Vocal Views in numbers — what the platform actually does

Vocal Views is a marketplace, not just a video tool. Researchers post studies, qualified respondents are matched and paid an incentive, the interview happens over WebRTC video, the conversation is transcribed in real time, and the data is exportable for analysis — all on one platform.

Capability Detail Why it matters
Enterprise customers Google, McDonald’s, Netflix, Samsung Compliance, scale, and SLAs proven against the strictest procurement teams.
Languages supported 30+ via automatic transcription Cross-region research without bolt-on tools.
Interview formats 1:1, focus group, observer mode, interpreter mode Same product handles UXR, brand research, and qualitative academic studies.
Live human interpreter Behind-the-scenes simultaneous translation Removes the language barrier without forcing the respondent into English.
Outputs Recordings, transcripts, analytics dashboards, exports Researchers leave the call with everything insight-ready.
Stack React + Node.js + MongoDB + WebRTC/Kurento + Socket.io Mature stack chosen for fast iteration and a healthy ecosystem.

The two-minute walkthrough below is the actual operator interface, not a marketing render. Watch the Vocal Views product video on YouTube →

The market in numbers — why research platforms keep getting funded

Market research is a $140B industry and the qualitative-video slice is where the AI tooling is rewriting expectations.

Indicator Number Why it matters
Global market research industry (2024) ~$140B Even a 0.1% share is a healthy SaaS company.
Online video platform market CAGR 17.3% (2021–2027) Compounding tailwind for any video-first product.
UserTesting acquisition $1.3B (Thoma Bravo / Sunstone, 2022) Validates that this is a billion-dollar outcome category.
Respondent.io panel 4M+ verified participants, 4.9/5 rating Bigger panels = bigger network effect.
AI insight delivery uplift ~40% faster vs. manual analysis AI synthesis is the new differentiator.
Conversation analytics adoption 72% report better CX The buyer’s ROI story is now well documented.

The 2026 feature checklist for a serious research platform

Use the four tiers below to scope your MVP, then post-MVP, then differentiation. Skipping a tier-1 feature blocks enterprise procurement; skipping a tier-3 feature limits expansion revenue but does not stop launch.

Tier 1 — the marketplace + interview core

Panel and incentive management. Screener flow, qualification, scheduling, automated incentive payouts (Stripe, PayPal, Tremendous, Hyperwallet). Video chat over WebRTC. 1:1 plus observer-only viewers, with two-way audio when allowed. Recording. Cloud storage with retention policies and per-tenant key management. RBAC + audit log. Researcher, observer, admin, panel-ops — with immutable logs.

Tier 2 — transcript, search, and clip workflow

Multi-language ASR with speaker diarization. Whisper or Deepgram Nova for cost; AWS or Azure for enterprise compliance. Searchable transcripts. Word-level timestamps indexed in OpenSearch / Elastic. Clip and quote generation. One-click highlight extraction with auto-generated descriptions. Live captions during interview. Sub-2-second latency for accessibility and observer note-taking.

Tier 3 — AI synthesis and integrations

AI summary per session. 5–7 bullet narrative with key quotes. Sentiment per speaker. 0–1 score with timeline visualisation. Theme clustering across studies. Unsupervised topic mining; ask-your-data Q&A across the corpus. Integrations. Miro, FigJam, Notion, Slack, Figma, plus a Zapier tier for the long tail.

Tier 4 — live human interpreter mode

Three-stream audio routing (participant → interpreter → researcher), separate channels per language, recording of every channel for audit, and a UI that survives multilingual chaos. This is the Vocal Views differentiator and the hardest feature to copy.

Reach for live interpretation when: the buyer’s research must include populations who do not interview comfortably in English. For pure UXR with global enterprise users, post-session translation plus AI summary is good enough and saves 30–40% of build cost.

Reference architecture for a research-interview platform

The pipeline below is the canonical layout we use on Vocal Views and similar projects. Each stage scales horizontally; per-tenant tenant-isolation lives in stages 6–8.

Stage Component Output
1. Capture WebRTC client (browser + mobile) VP9 / H.264 video, Opus audio
2. SFU + observer fan-out MediaSoup / Janus / LiveKit / Kurento Forwarded streams + recording egress
3. Recording FFmpeg or AWS MediaConvert MP4/HLS in S3 with lifecycle tiering
4. ASR + diarization Whisper / Deepgram / AWS Transcribe JSON transcript with speaker labels
5. AI synthesis GPT-4-class model Summary, clips, sentiment, themes
6. Search OpenSearch / Elastic Per-tenant search index
7. API + auth Node.js + Socket.io RBAC, audit log, realtime events
8. Marketplace + payments Stripe + Tremendous / Hyperwallet Incentive payouts, take-rate splits
9. Client React (web) + native iOS/Android Researcher console, observer view, panellist app

A deeper dive into the SFU choice lives in P2P vs MCU vs SFU for video conferencing and the broader 2026 architecture map in our WebRTC architecture guide for business.

WebRTC SFU choice — MediaSoup, Janus, LiveKit, or Kurento

For a research interview the SFU has to handle 1:1 reliably, plus 5–15 silent observers, plus an interpreter mode that introduces a third audio channel. Four engines cover the field.

Engine License Strengths Best for
MediaSoup ISC Highest performance, modular, Node.js + Rust High-volume marketplaces, custom recording
Janus GPLv3 Mature plugin system (recording, SIP, broadcast), C-based Carrier-grade reliability, SIP bridging
LiveKit Apache 2 (OSS) + SaaS Fastest integration path, managed cloud option Teams without deep WebRTC expertise
Kurento Apache 2 Pluggable media pipeline, CV/ML hooks Custom processing, computer-vision overlays

Vocal Views shipped on Kurento because the team needed pipeline-level hooks for observer fan-out and live ASR routing. Modern projects often default to MediaSoup or LiveKit for simpler ops; Janus remains the carrier-grade choice. We documented the Kurento story in What is Kurento Media Server.

Multi-language ASR — the engine that turns interviews into data

Transcription is no longer a feature; it is the gateway to every downstream insight. For 30+ languages with speaker diarization the field narrows to four engines.

Engine Languages Pricing Best for
Deepgram Nova-3 45+ ~$4.30 / 1K min Lowest cost, conversational accuracy
AWS Transcribe 100+ $0.024 / min batch Enterprise compliance, BAA available
Google Speech-to-Text 100+ ~$16 / 1K min Robust noise handling, Google ecosystem
Azure Speech 140+ Custom enterprise HIPAA-heavy verticals, Microsoft stack
Whisper Large-v3 (self-host) ~50 GPU rental only Cost control at scale, on-prem

Practical rules of thumb: Deepgram for cost-sensitive English-heavy workloads, AWS for HIPAA verticals, Azure if the buyer is a Microsoft shop, Whisper if you need on-prem or process more than ~10K hours per month.

Need an ASR vendor decision in writing?

We have shipped Vocal Views, BrainCert, and dozens of streaming products. We will pick the right ASR engine for your language mix and unit economics in one call.

Book a 30-min scoping call → WhatsApp → Email us →

Live interpreter mode — the engineering challenge that wins enterprise deals

Most research platforms quietly assume the participant speaks the researcher’s language. Vocal Views does not. The interpreter feature is what makes Google research a Korean teenager via an English-speaking moderator without forcing the participant into a foreign tongue. The trade-offs are concrete.

1. Three audio channels. Participant audio → interpreter; interpreter audio → researcher; researcher audio → interpreter. Recording captures every channel separately for audit.

2. Time multiplier. A 30-minute interview becomes a 60-minute session because of translation lag and cognitive load. Pricing must reflect the doubling.

3. UI clarity. Multiple languages on screen are a UX hazard. Indicate who is speaking, in which language, and which stream the listener is hearing — without overwhelming the moderator.

4. Latency budget. Two to three seconds of translation lag is the floor. Anything more breaks the conversation. Use WebRTC sub-stream prioritisation, not HLS, for the interpreter loop.

Want to see how AI is starting to compete with human interpreters? Our deep dive in AI simultaneous interpretation covers the trade-offs.

Marketplace economics — take rate, panel cost, incentive payouts

A research-platform business has two cost centres researchers never see: panel acquisition and incentive payment infrastructure. Understanding both is what separates a sustainable take rate from a margin disaster.

Take rate. Industry standard sits between 10% on session fees and 35% mark-up on incentive costs. Aggressive marketplaces blend both — subscription for sellers plus a take rate on transactions.

Panel acquisition. Building a 4M-participant panel like Respondent.io takes 2–3 years and significant CAC. For most new entrants the right move is to integrate a third-party panel API for the first 18 months and grow your own panel where the market underserves (industry verticals, hard-to-reach professionals, niche geographies).

Incentive payouts. Same-day to next-day payouts are now the floor; Stripe Connect, Tremendous, and Hyperwallet all solve the cross-border payment problem at different price points. Plan for FX, tax-form generation, and KYC.

Security & compliance — what enterprise procurement asks first

1. SOC 2 Type II. The enterprise tick-box. $25–$40K readiness assessment plus annual audit. Build the controls in sprint 1.

2. GDPR. Lawful basis, DPIA where high-risk, EU data residency option, right to erasure on individual research participants. The right-to-erasure flow is the part teams underestimate.

3. CCPA / CPRA. US data-subject rights. Light lift if GDPR is already in place.

4. HIPAA. Required for medical research interviews. BAA with the cloud provider, encryption at rest and in transit, audit logging on every replay.

5. IRB-friendly workflows. Academic studies need consent capture, anonymisation tooling, and retention controls exposed in the UI — not buried in admin scripts.

Build vs. buy — the panel-and-moat rule

The honest decision rule: if your differentiator is a niche panel or a specific analysis workflow, build. If you need broad recruitment fast, integrate UserTesting / Respondent.io APIs and build only the differentiating layer on top. The framework below maps the common scenarios.

Scenario Recommendation Why
Need broad consumer panel fast Buy + integrate (UserTesting API) Building a 4M panel takes years; vendor amortises the cost.
Niche panel or proprietary access Build custom (Vocal Views zone) Vendors do not have your supply; the panel itself is the moat.
AI-native synthesis is the moat Hybrid — UserTesting API + custom AI layer Pay for recruitment; differentiate on the analysis.
Vertical workflow (medical, legal) Custom build with HIPAA/IRB hardening Compliance and workflows shape the product end-to-end.
Reseller / SaaS go-to-market Custom build You cannot resell UserTesting. You can resell what we build for you.

Cost model — what an MVP and a full-feature platform actually run

Numbers below are realistic Fora Soft engagements with Agent Engineering applied. They are intentionally conservative; in practice we beat them more often than we miss them.

Scope Included Indicative range Calendar
MVP — interview + transcript WebRTC, ASR, recording, panel basics, Stripe payouts $70K–$120K 12–14 weeks
AI synthesis layer Clip extraction, sentiment, themes, AI summary +$25K–$50K +3–5 weeks
Live interpreter mode Three-stream routing, interpreter UI, multi-track recording +$30K–$60K +4–6 weeks
Integrations pack Miro, Notion, Slack, Figma, Zapier +$15K–$30K +2–3 weeks
Compliance pack (SOC 2 + GDPR) Audit logs, encryption review, vendor due-diligence pack +$25K–$50K +1–2 months

Run-rate operations: $500–$3,000/month for cloud infrastructure at MVP scale, plus $200–$2,000/month in API costs (ASR, AI synthesis), plus a dedicated SRE on call. Scale linearly with active studies.

Mini case — what shipping Vocal Views taught us

Situation. The client came to Fora Soft with a hypothesis — that enterprise market research could move online without losing the human-interpreter quality clients like Google and McDonald’s expect. The problem was that no off-the-shelf platform combined a panel marketplace, multi-language ASR, and a live interpreter loop in one product.

Plan. Build the WebRTC core on Kurento for pipeline-level control; layer 30+ language ASR via cloud APIs; design the three-stream interpreter mode from sprint 1 instead of bolting it on later; ship a panel marketplace with screener, scheduling, and Stripe payouts; expose researcher analytics for export. Roughly 9–12 months end-to-end with iterative releases to early enterprise customers.

Outcome. Today Vocal Views serves Google, McDonald’s, Netflix, and Samsung on the same multitenant codebase. The interpreter mode is the feature procurement teams cite when they choose Vocal Views over UserTesting or dscout. Want a similar 12-week MVP roadmap for your research platform?

A decision framework — pick a research-platform path in five questions

1. Where does your panel come from? Niche or proprietary → build. Broad consumer → integrate a third-party panel API.

2. What languages does your buyer interview in? English-only → one ASR vendor. 30+ → multi-vendor abstraction layer with fall-through fallback.

3. Live interpreter or post-session translation? Live = differentiator and 30–40% extra build cost; post-session is good enough for most UXR.

4. Where is the AI moat? Synthesis (clip extraction, sentiment, themes) → build the layer; recruitment → buy a panel API.

5. Compliance floor? SOC 2 + GDPR is the modern baseline. HIPAA only if you sell into clinical / healthcare research.

Pitfalls we have watched research-platform teams fall into

1. Treating ASR as “just an API call.” Speaker diarization, language detection, and noise robustness are real engineering work. Audit transcripts on the team’s own recordings before shipping.

2. Skipping observer mode in v1. Stakeholder observer rooms are how research becomes a team sport. Without them adoption is capped.

3. Underestimating incentive payouts. Cross-border payment, FX, tax-form generation, and chargeback handling are weeks of work, not a sprint.

4. Bolting on compliance. SOC 2 retrofit costs 3× designed-in. Same for HIPAA.

5. Ignoring the post-interview workflow. Researchers need clips, summaries, and exports. Without them the platform is a recording tool, not an insights tool.

KPIs — what to actually measure

Quality KPIs. Stream uptime per session (99.5%+), p95 join time (< 4 s), ASR word-error rate per language (target < 8% on clean audio), AI summary acceptance rate (researchers don’t edit it within 24 h).

Business KPIs. Take rate, gross margin per session, panellist NPS, repeat-buyer rate, integration-driven expansion revenue (Miro / Notion / Slack-pulled exports).

Reliability KPIs. Recording success rate (over 99.5%), incentive payout SLA (under 24 h), audit-log completeness (100%), data-deletion SLA on GDPR/CCPA requests (under 30 days).

When NOT to build a custom research platform

Skip the build when (a) the team needs to ship inside eight weeks, (b) the panel requirement is broad consumer and you don’t have a recruitment moat, (c) total budget is below $80K, or (d) the buyer’s differentiator is brand recognition rather than a unique workflow. In each case the right move is to integrate UserTesting, dscout, or Respondent.io APIs and build only the differentiating layer on top — we still help build that layer.

Want a build-vs-buy verdict in writing?

A 30-minute call gets you a one-page recommendation matched to your panel, language mix, AI moat, and budget — honestly told when buying instead would be cheaper.

Book a 30-min call → WhatsApp → Email us →

FAQ

Who uses Vocal Views?

Vocal Views serves enterprise market-research teams at Google, McDonald’s, Netflix, and Samsung, plus mid-market consumer-insights teams. The platform handles 1:1 interviews, focus groups, observer rooms, and live human interpretation in 30+ languages.

How long does it take to ship an MVP research platform?

12–14 weeks for an MVP with WebRTC video, multi-language ASR, recording, basic panel management, and Stripe-based incentive payouts. AI synthesis and live interpreter mode add 3–6 weeks each. Agent Engineering at Fora Soft compresses these calendars below the industry baseline.

Should we build on MediaSoup, Janus, LiveKit, or Kurento?

MediaSoup for raw performance, Janus for carrier-grade reliability, LiveKit for fastest path to MVP (managed cloud option), Kurento for pipeline-level control over recording and ASR routing. Vocal Views shipped on Kurento; modern projects often default to MediaSoup or LiveKit.

Which ASR engine handles 30+ languages best?

Deepgram Nova-3 for cost-sensitive workloads (45+ languages, ~$4.30 / 1K minutes). AWS Transcribe for enterprise compliance (BAA available, 100+ languages). Azure Speech for Microsoft-stack buyers (140+ languages, HIPAA-friendly). Self-hosted Whisper for on-prem or large-scale cost control.

Is live human interpretation worth the engineering cost?

If your buyer interviews populations who would not consent to interview in English, yes — it is the feature procurement cites when choosing Vocal Views over generic competitors. If not, post-session translation plus AI summary is good enough and saves 30–40% of build cost.

When does building a custom platform beat using UserTesting or dscout APIs?

Build when your differentiator is a niche panel or specialised analysis workflow that vendor APIs cannot deliver. Buy + integrate when you need broad recruitment fast and your moat sits in synthesis, integrations, or vertical workflows.

Has Fora Soft shipped other video-marketplace products?

Yes. Beyond Vocal Views, we built BrainCert — a virtual classroom and LMS with 100K+ paying customers — and ProvideoMeeting, an enterprise video conferencing platform. Around 40% of our active engineering work is in video and AI.

What does compliance hardening cost?

$25–$50K for the SOC 2 + GDPR build pack on top of a base MVP, plus $15–$25K annual auditor fees. HIPAA adds another $20–$50K depending on cloud provider and recording-encryption scope. Build the controls in sprint 1 instead of retrofitting.

Architecture

P2P vs MCU vs SFU for video conferencing

A practical comparison for product owners deciding the topology for an interview platform.

AI

AI simultaneous interpretation

Where AI translation is starting to compete with human interpreters — and where it isn’t.

Media server

What is Kurento Media Server

The pipeline-level engine that powers Vocal Views’ observer and interpreter modes.

Build vs buy

Build vs buy a video chat platform

The same framework applied to general-purpose video chat — useful sanity check for marketplaces.

Ready to ship your own research marketplace?

A modern research-marketplace platform has four jobs — recruit cleanly, interview reliably, transcribe accurately, and synthesise instantly. Vocal Views is the proof point at enterprise scale; the architecture, the ASR mix, the SFU choice, and the live-interpreter loop are the levers that decide whether your product wins enterprise deals or stalls at pilot.

If your panel, your AI moat, or your vertical workflow puts you in the build column, the fastest next step is a 30-minute call with the team that already shipped a system serving Google, McDonald’s, Netflix, and Samsung. We will walk the architecture, the cost, and the realistic calendar in one session — and tell you honestly when buying instead would be cheaper.

Talk to the team behind Vocal Views

Book a 30-minute call. We will scope your research-platform build — SFU, ASR, AI synthesis, panel, payouts, compliance, calendar, budget — in one session.

Book a 30-min call → WhatsApp → Email us →

  • Cases