Why this matters
If you are an L&D director, an EdTech founder, or a product lead, "we need learning video" is not a spec — it is five different products wearing the same words. A platform built for recorded lectures and one built for live cohorts share almost no infrastructure, yet teams routinely scope one and build the other. This article gives you a clear map of the five formats, what each one costs to build and run, how each is tracked, and which learning job each one does best — so you can choose deliberately and brief your engineers and instructional designers with the right mental model.
The five shapes of learning video
Every learning video, no matter the subject, falls into one of five formats. They differ along two axes that drive almost every technical decision: whether the video is recorded (watched on demand, any time) or live (happening now, in real time), and how much the learner can interact with it beyond pressing play.
The recorded end is video-on-demand — the term for media a learner streams whenever they want, called VOD — and it is cheap, infinitely repeatable, and easy to measure. The live end runs on real-time communication and is expensive, ephemeral, and measured differently. Everything else — cost, the standard you track with, the failure modes — flows from where a format sits between those poles.
Figure 1. The five formats placed on two axes: recorded-versus-live and how interactive the experience is. Where a format sits predicts its cost and its tracking.
Before defining each, one piece of vocabulary you will need throughout. To prove that learning happened, a platform records it in a standard format other systems can read. The older standard, SCORM (the Sharable Content Object Reference Model, maintained by Advanced Distributed Learning, or ADL), is a shipping container for a course that any compliant learning management system — the platform that hosts courses and learners, called an LMS — can load and read [1]. The modern standard, xAPI (the Experience API, once nicknamed Tin Can API), records learning as short sentences — "Maria completed Module 3" — sent to a Learning Record Store, or LRS, the notebook those sentences are written into [2]. Keep those two names in mind; each format below leans on them differently.
Format 1 — The recorded lecture
The recorded lecture is the workhorse of online learning: an instructor talks to camera, often beside slides or a screen recording, and the file is stored once and streamed on demand to thousands of learners. It is the cheapest format to operate because the expensive work — recording, encoding, captioning — happens once, and delivery afterward is just serving a file.
Architecturally it is pure VOD. You encode the source once into an adaptive set of qualities so the player can switch based on the learner's connection, store it, and serve it through a content delivery network — the geographically distributed cache that puts the video close to the viewer, called a CDN. Tracking is the simple case: the player emits "started," progress milestones, and "completed," and reports them through SCORM or, for richer video data, the xAPI Video Profile — a community add-on to xAPI that standardizes the events for played, paused, seeked, and completed so second-by-second viewing becomes data [3].
The catch is the one every learning team eventually meets: "watched 100%" is not "learned," and not always "completed." A raw player knows only how many seconds were rendered. If "complete" should mean reaching the end and passing a check, you must define and enforce that yourself — SCORM 2004 even separates completion_status (did they finish?) from success_status (did they pass?) for exactly this reason [1]. The recorded lecture's weakness is engagement: with nothing to do but watch, attention fades, which is the problem the next two formats attack directly.
Format 2 — Microlearning
Microlearning is the recorded lecture cut into small, single-idea pieces — typically two to six minutes each — designed to be finished in one sitting. It is the same VOD architecture as a lecture, so it costs the same to deliver; what changes is the editorial shape, and that shape is backed by hard data.
The evidence for short video is unusually strong. An MIT analysis of 6.9 million MOOC video sessions found that median engagement maxes out at about six minutes regardless of how long the video actually is — past that point, you are paying to encode and store footage most learners never reach [4]. Industry completion data tells the same story from the other side: videos under six minutes routinely approach full completion, while completion falls sharply as length climbs past nine and twelve minutes [5]. Cutting a 40-minute lecture into eight five-minute units does not just feel friendlier; it measurably lifts how much gets watched and retained.
Walk the arithmetic out loud, because it is the whole case for the format. Suppose a 40-minute lecture is watched, on average, to the 6-minute mark — that is 15% of the content actually consumed. Now cut the same material into eight 5-minute units. If each short unit is finished at the ~90% rate short videos typically see, average consumption becomes 0.90 × 40 = 36 minutes, or 90% of the content. Same footage, same delivery cost, six times the material actually absorbed. That ratio — not novelty — is why microlearning dominates corporate training catalogs.
Microlearning's tracking is where xAPI earns its place. Because each unit is a discrete object, an LRS can record a clean trail of which micro-units a learner finished and in what order, which is far more useful for analytics than one coarse "watched the lecture" flag.
Format 3 — Interactive video
Interactive video adds things the learner does inside the video: in-player quizzes, clickable hotspots, and branching — where the learner's choice changes which clip plays next, turning a linear video into a choose-your-path scenario. It is the first format that stops being a file you stream and becomes an application you run.
That application changes the architecture. On top of the VOD base you now need an overlay layer that pauses the video and renders the question, a small state machine that decides the next segment in a branching scenario, and a tracking bridge that emits each interaction — answer chosen, branch taken, hotspot clicked — as its own xAPI statement to the LRS. SCORM can record a basic interactions set, but the granularity interactive video produces is squarely xAPI Video Profile territory [3]. We go deep on the build in what makes video interactive and branching scenarios.
The reason teams take on that extra work is engagement, and the numbers are real. Across large benchmark datasets, interactive video completion runs well above standard video, and viewers who touch even one in-video interaction stay engaged markedly longer [6]. The trade-off is production: a branching scenario with four decision points and three outcomes per point is not one video, it is potentially dozens of clips plus the logic to route between them. The pitfall here is scoping a branching lesson as "a video with some buttons" and discovering the routing logic, the per-branch tracking, and the multiplied filming are most of the work.
Format 4 — The live virtual classroom
The live virtual classroom flips every assumption. Instead of a stored file, it is a real-time session: an instructor and learners share live video, voice, a whiteboard, and screen sharing, all happening now. It runs on WebRTC (Web Real-Time Communication), the browser technology that carries live audio and video at roughly 200–500 milliseconds glass-to-glass so people can actually interact [7]. The protocol internals live in the Video Streaming section's WebRTC explainer; here, the point is what live does to your cost and tracking.
Cost inverts. A recorded lecture is cheap per extra viewer; a live class is billed by the participant-minute, because every session consumes real-time server capacity that cannot be cached or reused. Scaling a 20-seat seminar to a 500-seat lecture is a genuine engineering change, not a slider — it pulls in a media server that selectively forwards streams, covered in the virtual classroom. Tracking inverts too: there is no "resume" because the moment is gone when it ends, and "completion" usually means attendance plus participation, recorded as xAPI statements ("attended," "asked a question," "answered a poll") rather than a watch-percentage. The recording you keep afterward is a different asset — a recorded lecture, format 1 — that re-enters the catalog and gets tracked all over again.
Format 5 — The cohort-based course
A cohort-based course is not a single video format but a structure that wraps the others: a group of learners moves through a program together on a fixed schedule, mixing recorded lessons, live sessions, interactive assignments, and peer discussion. Technically it is the most demanding because it combines all four formats above plus the machinery of groups — enrollment windows, schedules, peer cohorts, and discussion.
The reason anyone builds something this complex is one number: completion. Self-paced online courses and MOOCs are notorious for completion in the single digits to low double digits, while cohort-based courses routinely report completion several times higher because learners are accountable to a schedule and to each other [8][9]. The format buys engagement through social structure rather than through editing or interactivity. The build cost is real — you are not shipping a player, you are shipping a program-management system with video inside it — which is exactly why the cost model and a build-vs-buy decision matter most here.
How architecture, tracking, and cost change across formats
The five formats are easiest to compare side by side. Read the "standards fit" row carefully: it is the row that quietly dictates which tracking work your team signs up for.
| Dimension | Recorded lecture | Microlearning | Interactive video | Live classroom | Cohort course |
|---|---|---|---|---|---|
| Delivery | VOD (file + CDN) | VOD (file + CDN) | VOD + overlay app | Real-time (WebRTC) | All of the above |
| Marginal cost per learner | Very low | Very low | Low | High (per-minute) | High |
| Resumable? | Yes | Yes (per unit) | Yes (with state) | No (it is live) | Yes (per component) |
| What "complete" means | Reached end (+ check) | Finished the unit | Passed interactions | Attended + took part | Finished the program |
| Best-fit standard | SCORM / xAPI | xAPI (per-unit) | xAPI Video Profile | xAPI (events) | xAPI across all |
| Engagement lever | Low (passive) | Short length | In-video actions | Live presence | Social accountability |
| Build complexity | Low | Low | Medium | High | Highest |
The pattern is consistent: cost and complexity rise as you move from a stored file toward live, social, real-time experiences, and the standard you lean on shifts from SCORM's fixed fields toward xAPI's open, per-event record. No single format is "best" — each is the right answer to a different learning job.
The arithmetic: why live and recorded are not the same bill
The starkest cost gap is recorded versus live, and it is worth seeing in numbers. Take a recorded lecture delivered to 1,000 learners. A 30-minute video at 2 Mbps is about 0.45 GB per view; at a typical CDN egress price near $0.05 per GB, that is 1,000 × 0.45 × $0.05 = $22.50 to serve the whole audience — once encoded, it scales almost for free.
Now run the same 1,000 learners through a 30-minute live class. Live video is not cached; each participant-minute consumes real-time media-server capacity, commonly billed in the range of $0.004 per participant-minute on a communications platform. That is 1,000 × 30 × $0.004 = $120 for one session — and unlike the recording, you pay it again every time you run the class. The recorded lecture is a one-time encode and a trivial egress bill; the live class is a recurring, per-head, per-minute cost. That single difference, not feature lists, is why the build-vs-buy and format decision has to come first.
The common mistake: picking the format last
The most expensive error in this field is treating "format" as a styling choice made late, after the platform is built. Teams scope "a learning-video platform," build a VOD player and a SCORM wrapper, then discover the business actually wanted live cohorts — and now they need a media server, a scheduler, peer groups, and an entirely different cost model bolted onto the wrong foundation. The fix is to choose the format first, from the learning job: pick recorded for scale and reference, microlearning for completion, interactive for skills practice, live for real-time interaction, and cohort for accountability — then let that choice drive delivery, tracking, and budget, instead of the other way around.
Where Fora Soft fits in
Fora Soft has built video software since 2005 across video conferencing, streaming, OTT, surveillance, e-learning, and telemedicine — more than 239 shipped projects. That range matters here because the five formats draw on different parts of our experience: recorded and interactive video lean on our streaming and interactive-player work, while live virtual classrooms lean on the WebRTC real-time stack we have shipped many times. The build-vs-buy conversation we help teams have is rarely "can we play video" — it is "which format does your learning job actually need, and which pieces do you build versus assemble." Getting that one decision right is what keeps a learning-video budget from doubling.
What to read next
- What is e-learning video, and how it differs from streaming and conferencing
- The pedagogy of video: attention, retention, and chunking
- The anatomy of a learning-video platform, end to end
Call to action
- Talk to a e-learning engineer — book a 30-minute scoping call to talk through your learning video formats plan.
- See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
- Download the Learning Video Format Chooser — A one-page guide to picking among the five learning-video formats by learning job, cost, tracking standard, and build complexity.
References
- Advanced Distributed Learning (ADL), "SCORM 2004 4th Edition — Run-Time Environment" — fixed data model (completion_status, success_status, score, total_time); separation of completed and passed. https://adlnet.gov/projects/scorm/ (accessed 2026-06-19). Tier 1.
- Advanced Distributed Learning (ADL), "xAPI Specification 1.0.3, Part 2: Experience API (Statements)" — actor-verb-object statements stored in a Learning Record Store; xAPI formerly Tin Can API. https://github.com/adlnet/xAPI-Spec/blob/master/xAPI-About.md (accessed 2026-06-19). Tier 1.
- ADL / xAPI community, "xAPI Video Profile" — standardized verbs and extensions for played, paused, seeked, and completed video events. https://github.com/adlnet/xAPI-Video-Profile (accessed 2026-06-19). Tier 1.
- Guo, P. J., Kim, J., & Rubin, R. (2014), "How video production affects student engagement: An empirical study of MOOC videos," ACM Learning @ Scale (L@S '14) — 6.9M sessions; median engagement maxes at ~6 minutes regardless of length. https://dl.acm.org/doi/10.1145/2556325.2566239 (accessed 2026-06-19). Tier 5.
- TechSmith, "2026 Video Statistics: Key Viewer Insights" — completion approaches 100% under ~6 minutes and falls sharply past 9–12 minutes. https://www.techsmith.com/blog/2026-video-statistics/ (accessed 2026-06-19). Tier 6.
- Wistia, "State of Video Report: Video Marketing Statistics for 2026" — interactive-video completion exceeds standard video; in-video interaction lifts engagement. https://wistia.com/learn/marketing/video-marketing-statistics (accessed 2026-06-19). Tier 6.
- nanocosmos, "WebRTC Latency: Comparing Low-Latency Streaming Protocols (2026 Update)" — WebRTC ~200–500 ms glass-to-glass. https://www.nanocosmos.net/blog/webrtc-latency/ (accessed 2026-06-19). Tier 4.
- ProductGrowth, "Course Completion Rates: Benchmarks for EdTech Products" — MOOC/self-paced completion in the single-to-low-double digits; cohort and community-discussion courses far higher. https://productgrowth.in/insights/edtech/course-completion-rates-benchmarks/ (accessed 2026-06-19). Tier 6.
- Learnopoly, "60+ Top Statistics On Cohort Based Learning (2025)" — cohort-based completion reported several times higher than self-paced. https://learnopoly.com/cohort-based-learning-statistics/ (accessed 2026-06-19). Tier 6.
Per the source hierarchy, where vendor and industry blogs (tiers 4–6) and the official specs disagreed on what SCORM and xAPI track, this article follows ADL's specifications: SCORM records a fixed data model and separates completion from success, while granular per-event and video tracking belong to xAPI and the xAPI Video Profile. The completion and engagement percentages are drawn from industry benchmark datasets and one peer-reviewed MOOC study, and are labelled as such.


