Published 2026-05-27 · 18 min read · By Nikolay Sapunov, CEO at Fora Soft
Why This Matters
Optical flow is not a feature you ship to end users. It is a primitive — a building block that lives inside features. The multi-object tracker that follows a player around a football pitch uses optical flow to predict where each tracked box should appear in the next frame. The video stabilizer on a phone uses optical flow to estimate camera shake. The frame interpolator that turns 30 FPS footage into 60 FPS for a smoother slow-motion shot uses optical flow to invent the missing frames. The temporal alignment block inside BasicVSR++ — the open-weight default for video super-resolution — is literally a learned optical flow estimator. Get the flow primitive wrong and every downstream feature breaks. This article is for the product manager, video-platform engineer, or founder who needs to make build-versus-buy decisions on a video feature whose engineering description mentions "optical flow" — which, in 2026, is most of them.
The Mental Model — Optical Flow Is The Motion Field Between Two Frames
Imagine two consecutive frames of a video laid side by side. Frame one shows a red car on the left side of the road. Frame two, taken 1/30th of a second later, shows the same red car six pixels to the right. Optical flow is the answer to one question, asked for every pixel: where did this pixel come from in the previous frame, and where is it going in the next? The answer is a two-number vector per pixel — the horizontal velocity and the vertical velocity — and the whole map of answers, one vector per pixel, is called a flow field.
A flow field for a 1080p video has roughly 2 million vectors. For HD it has about 920,000. For SD it has about 100,000. Computing all of them, accurately, for thirty frames every second, is one of the harder problems in computer vision — and it has been a research topic continuously since 1981.
The brightness number — what engineers call a pixel's intensity — for any given location in frame two is assumed to equal the brightness at some location in frame one. That single assumption, the brightness constancy constraint, is the foundation of every optical flow algorithm. The equation looks like one line of high-school calculus: I_x · u + I_y · v + I_t = 0. The terms I_x and I_y are how much the image brightness changes when you move one pixel right or one pixel down. The term I_t is how much the brightness changes from frame one to frame two at this location. The unknowns are u and v, the horizontal and vertical velocity. One equation, two unknowns — which means you cannot solve it for any single pixel in isolation. Every optical flow algorithm in history is a different answer to one question: what extra assumption do I add to make this solvable?
Figure 1. The brightness constancy constraint — one equation per pixel, two unknowns. The aperture problem in one line: you cannot recover both motion components from a single pixel's brightness change.
Lucas-Kanade — The 1981 Classic That Still Ships In 2026
Bruce Lucas and Takeo Kanade, at Carnegie Mellon, published their iterative image registration technique in 1981. Their idea was elegant: do not try to solve the brightness constancy equation for a single pixel; instead, take a small window of pixels — typically 3×3, 5×5, or 7×7 — and assume that every pixel in the window has the same motion. That gives you nine, twenty-five, or forty-nine equations in two unknowns, which is overdetermined, so you solve it with least-squares. The output is one motion vector per window, not per pixel, which is why Lucas-Kanade is called a sparse method.
The brightness-versus-position relationship inside that small window is linearised — meaning the algorithm assumes the brightness changes smoothly across the window in a way that can be approximated by a straight line. That assumption fails for large motions (a pixel moving twenty pixels between frames cannot be modelled by a smooth local approximation), and so Lucas-Kanade ships in OpenCV in its pyramidal form: the image is downsampled to half resolution, then quarter, then eighth, and the algorithm tracks the motion at the coarsest scale first, then refines progressively at each finer scale. This trick — invented by Bouguet and others in the late 1990s — extends the algorithm's reach from a few pixels of motion to a hundred or more.
In OpenCV the function is cv2.calcOpticalFlowPyrLK. The standard pattern is to first detect corners — points where the image has strong gradients in two directions, which is exactly the condition under which the least-squares solution is well-conditioned — using cv2.goodFeaturesToTrack (the Shi-Tomasi 1994 corner detector). You get a list of two hundred or four hundred corner points, you hand both frames and the corner list to Lucas-Kanade, and you get back two hundred or four hundred motion vectors. The whole call costs about 5 milliseconds on a modern CPU at 720p — that is roughly 200 frames per second on a single core, no GPU required.
The output is sparse. If the camera is panning across a scene with two hundred tracked corners, you learn the motion at those two hundred locations. You learn nothing about the smooth, textureless wall behind the corners — there are no gradients there to track. That sparseness is the model's strength (it does not pretend to know what it cannot measure) and its limit (a feature that needs flow at every pixel, like video stabilization or frame interpolation, needs more than Lucas-Kanade can give).
The math, worked out loud, for a 5×5 window: you build a matrix A of size 25×2 where each row is (I_x, I_y) for one pixel in the window. You build a vector b of size 25 where each entry is -I_t for one pixel. You solve A^T A · v = A^T b for v = (u, v). The matrix A^T A is 2×2; the matrix A^T b is 2×1; the inverse of a 2×2 matrix is a closed-form three-line formula. Per tracked point, this is about 200 floating-point operations — and that is why a CPU can track hundreds of points at 200 FPS in real time.
The failure modes are well-known. Lucas-Kanade fails on large motions beyond what the pyramid can recover, on motion-blurred input where the brightness assumption breaks down, on textureless regions where there are no gradients, on transparent or reflective surfaces where the brightness constancy assumption is false, and on illumination changes between frames. For everything else — and that covers most well-lit, sharply-shot video — it works astonishingly well for a 45-year-old algorithm.
RAFT — The 2020 Deep Learning Reset
Zachary Teed and Jia Deng, at Princeton, published RAFT — Recurrent All-Pairs Field Transforms — in the run-up to ECCV 2020, where it won the best paper award. The arXiv preprint is 2003.12039. The reference implementation is at github.com/princeton-vl/RAFT. The licensing is BSD 3-Clause, commercial-friendly.
RAFT does not use the brightness constancy equation directly. Instead, it does three things. First, a feature encoder — a small convolutional network with six residual blocks — converts each input frame from raw RGB pixels into a 256-channel feature map at one-eighth of the original resolution. The feature encoder is shared between the two frames; the same network runs on frame one and frame two. A separate context encoder runs on frame one only, producing context features that guide the iterative update later.
Second, RAFT computes a 4D correlation volume. For every feature at one-eighth resolution in frame one (call there be H/8 × W/8 of them), it computes the dot product with every feature in frame two (another H/8 × W/8 of them). The result is a four-dimensional tensor of shape (H/8) × (W/8) × (H/8) × (W/8) where each entry is "how similar is this feature in frame one to this feature in frame two". The volume is then average-pooled at multiple scales (kernel sizes 1, 2, 4, 8) to give a multi-scale correlation pyramid that captures both small and large motions.
Third, an update operator — a recurrent neural network based on a Gated Recurrent Unit — iteratively refines a flow estimate. It starts with zero flow everywhere. At each iteration, it uses the current flow estimate to look up correlation values from the 4D volume, mixes those with the context features from frame one, and outputs a residual update to the flow. After twelve iterations at training time (twenty at test time, typically), the flow estimate is the final output.
The accuracy numbers, on standard benchmarks, were a step-change. On the Sintel benchmark (a synthetic dataset built from Blender movies) final pass, RAFT achieved an end-point-error of 2.855 pixels, a 30 percent error reduction from the previous best published result. On KITTI (a real-world autonomous-driving dataset), RAFT achieved an F1-all error of 5.10 percent, a 16 percent error reduction. The model has about 5.3 million parameters in its full configuration — small by 2026 standards, large by 1981 ones.
The runtime: on a single NVIDIA 1080 Ti GPU, the original implementation processes a 1088×436 frame at 9 FPS. A smaller variant with one-fifth the parameters runs at 20 FPS. On modern hardware — RTX 4090, A100 — those numbers double or triple. The follow-up SEA-RAFT (2024, arXiv 2405.14793) hits 20+ FPS at 1080p on an RTX 3090. That is real-time for many use cases, sub-real-time for high frame-rate workloads, and the central engineering reason a video product team picks RAFT for offline accuracy work and Lucas-Kanade or DIS for live tracking.
Figure 2. RAFT's three blocks — feature encoder, 4D correlation volume with multi-scale pooling, and recurrent GRU update operator that iteratively refines a dense flow field.
The Speed-Accuracy Trade-Off, Numerically
The two algorithms sit at opposite ends of a curve that defines every practical optical-flow decision in 2026. The numbers below come from the original Lucas-Kanade OpenCV documentation, the RAFT paper, the DIS paper, and the SEA-RAFT and FlowFormer follow-ups.
| Algorithm | Year | Type | Sintel final EPE | KITTI F1-all | Speed (1080p) | Hardware | Use case |
|---|---|---|---|---|---|---|---|
| Lucas-Kanade (pyramidal) | 1981 / 1999 | Sparse, classical | Not directly comparable | Not directly comparable | 200 FPS at 720p | CPU, single core | Real-time tracking of a few hundred points |
| Horn-Schunck | 1981 | Dense, classical | ~8–10 px (poor) | High | 5–10 FPS at 720p | CPU | Educational / baseline |
| Farnebäck | 2003 | Dense, classical | ~5–7 px | High | 30 FPS at 720p | CPU | Light real-time dense flow |
| DIS (Kroeger 2016) | 2016 | Dense, classical | ~4 px | Moderate | 300–600 FPS at SD | CPU, single core | Real-time dense flow without a GPU |
| RAFT | 2020 | Dense, deep | 2.855 px | 5.10% | 9 FPS at 1088×436 | GPU (1080 Ti) | Offline accuracy work |
| RAFT small | 2020 | Dense, deep | ~3.2 px | ~7% | 20 FPS at 1088×436 | GPU (1080 Ti) | Near-real-time, accuracy-flexible |
| FlowFormer | 2022 | Dense, transformer | 2.183 px | ~4.8% | 4 FPS at 1080p | GPU (A100) | State-of-the-art offline |
| SEA-RAFT | 2024 | Dense, deep (refined) | ~2.4 px | ~4.6% | 20+ FPS at 1080p | GPU (RTX 3090) | Fast and accurate; 2026 default for dense deep flow |
| MegaFlow / FlowIt / DPFlow | 2026 | Dense, deep | ~1.8–2.2 px | <4% | <5 FPS at 1080p | GPU | Research-grade accuracy |
A few things to read out of that table. Lucas-Kanade is not directly comparable on Sintel or KITTI because the standard benchmarks measure dense flow accuracy, and Lucas-Kanade does not produce dense output. The dense classical algorithms — Horn-Schunck, Farnebäck, DIS — are all dramatically less accurate than RAFT, and the gap has only widened with FlowFormer and the 2026 follow-ups. The trade is real-time CPU speed (DIS) versus offline deep accuracy (RAFT family). For most production video features, the right answer in 2026 is either Lucas-Kanade for sparse real-time tracking or SEA-RAFT for dense offline flow. The exotic options have specific niches.
The Three Failure Modes That Wreck Optical Flow Pipelines
We have integrated optical flow into video pipelines across four verticals at Fora Soft — video conferencing, surveillance, OTT, and e-learning analytics. Three failure modes show up across all of them.
Failure 1: Picking The Wrong Sparsity. A team picks Lucas-Kanade for a feature that needs flow at every pixel — say, video stabilization on a phone, or per-pixel frame interpolation for slow-motion playback — and discovers in QA that the stabilizer judders or the slow-motion is full of holes wherever the corner detector found nothing. The fix is to match sparsity to the feature: pick Lucas-Kanade only for features that genuinely consume sparse tracks (multi-object tracker box prediction, KLT-style feature tracking for visual odometry, sparse augmented reality anchors). Anything dense — stabilization, interpolation, super-resolution alignment — needs a dense estimator, which means DIS for CPU-only fast paths or RAFT/SEA-RAFT for accurate offline paths.
Failure 2: Ignoring The Brightness Constancy Assumption. A team deploys an optical-flow-based feature into a video chat product, and finds it fails dramatically on dim indoor calls and in mixed-lighting conference rooms. Every classical optical flow algorithm assumes that the brightness of a moving point stays constant frame-to-frame; that assumption breaks under aggressive auto-exposure, in low-light scenes where the camera's analog gain is high (and noisy), and under flicker from fluorescent or LED lighting. The fix is two-pronged: first, run an explicit photometric normalization step before the optical flow stage (locally normalise contrast on each frame); second, prefer deep learning models like RAFT or SEA-RAFT for brightness-unstable input because they are trained to be robust to the kinds of intensity variation that classical methods cannot handle.
Failure 3: Treating Latency As Negotiable. A product manager scopes a real-time AR effect — say, hand-tracked virtual jewelry in a video call — and budgets RAFT into the pipeline because "it is the most accurate". Production reality: RAFT at 9 FPS in a 30-FPS pipeline introduces 100 ms of latency that the AR effect cannot absorb, and the virtual ring visibly lags the user's hand. The fix is to budget latency before you budget accuracy: a sub-100ms real-time pipeline cannot afford a 9-FPS optical flow stage, full stop. For real-time AR, use Lucas-Kanade for sparse anchor tracking and a lightweight MediaPipe model for the dense body-part field, or budget for SEA-RAFT and accept its 20+ FPS as the upper bound on the pipeline frame rate.
Figure 3. The three failure modes. Each has a specific fix; none of them is "pick a more accurate model".
The Production Pattern — When To Pick Which
In our own integration work the decision tree is straightforward. Do you need flow at every pixel? If no — you only need it at a small number of feature points for tracking, visual odometry, or sparse anchors — use Lucas-Kanade. It is in OpenCV, it is CPU-only, it runs at 200 FPS, and forty-five years of production use mean its failure modes are documented.
If yes, is the pipeline real-time (live video chat, surveillance, AR)? If yes, pick DIS for the CPU-only fast path or SEA-RAFT for the accelerated path with a GPU. If the pipeline is offline (archive upscaling, post-production stabilization, content-aware encoding pre-pass), pick RAFT or SEA-RAFT for accuracy, and pick FlowFormer or the 2026 successors (MegaFlow, FlowIt, DPFlow) when accuracy genuinely matters more than runtime.
Do you need direction-of-motion only, or full sub-pixel accuracy? For action recognition and many tracking-by-detection pipelines, direction-of-motion is enough; DIS or Farnebäck suffice. For video super-resolution alignment, frame interpolation, and stabilization, sub-pixel accuracy is the whole point; pick a deep model.
The build-versus-buy question barely arises for optical flow. All the workhorses ship as open-source under permissive licenses — Lucas-Kanade in OpenCV (Apache 2.0), DIS in OpenCV-contrib (Apache 2.0), RAFT under BSD 3-Clause, FlowFormer under MIT, SEA-RAFT under BSD 3-Clause. There is no commercial optical flow vendor whose product is enough better than the open-source state-of-the-art to justify the integration cost. NVIDIA's Optical Flow SDK is the one exception worth knowing — it uses dedicated optical flow hardware on Turing and newer NVIDIA GPUs and gives you a deterministic, low-latency dense flow at very low compute cost. For a real-time path on NVIDIA hardware, the Optical Flow SDK is often the right answer; for a portable open-source path it is RAFT / SEA-RAFT.
Where Fora Soft Fits In
At Fora Soft we have integrated optical flow into video pipelines across video conferencing, surveillance, OTT, and e-learning. In video conferencing we use Lucas-Kanade for sparse face-anchor tracking under real-time AR effects, where the latency budget is brutal and the dense flow is unnecessary. In surveillance we use DIS for motion-region proposals — flagging which parts of the frame have moved enough to warrant running the more expensive multi-object tracker — and RAFT for offline forensic analysis of incidents. In OTT we use RAFT-grade flow inside content-aware encoding pre-passes for archive upscaling pipelines, paired with BasicVSR++ for the actual super-resolution stage. We do not run our own optical flow research; we integrate the open-source state-of-the-art into video products that ship and stay shipped.
What To Read Next
- Multi-object tracking — DeepSORT, ByteTrack, OC-SORT — the upstream consumer of sparse optical flow for box-to-box motion prediction.
- Real-ESRGAN / BasicVSR++ for OTT archive upscaling — the downstream consumer of dense optical flow inside the BasicVSR++ propagation block.
- Vision Transformer primer for video AI engineers — the architecture family that underpins the transformer-based flow models (FlowFormer, MegaFlow).
Talk To Us / See Our Work / Download
- Talk to a video engineer — book a 30-minute scoping call about an optical-flow-dependent feature.
- See our case studies — review the WebRTC, surveillance, and OTT projects we have shipped with optical flow inside the pipeline.
- Download the optical flow algorithm picker — a one-page printable decision worksheet that maps your feature's sparsity, latency, and accuracy requirements to one of the six production-grade optical flow algorithms, with the per-frame compute budget for each.
References
-
Lucas, B. D., Kanade, T. "An Iterative Image Registration Technique with an Application to Stereo Vision." Proceedings of the 7th International Joint Conference on Artificial Intelligence (IJCAI), 1981, pp. 674–679. Accessed 2026-05-27. The original Lucas-Kanade paper; the local-window least-squares formulation that underpins every sparse optical flow algorithm since.
-
Bouguet, J.-Y. "Pyramidal Implementation of the Lucas-Kanade Feature Tracker — Description of the Algorithm." Intel Corporation, Microprocessor Research Labs, 2000. Accessed 2026-05-27. The pyramidal extension that ships in OpenCV as
cv2.calcOpticalFlowPyrLK; coarse-to-fine refinement that extends Lucas-Kanade's reach to large motions. -
Horn, B. K. P., Schunck, B. G. "Determining Optical Flow." Artificial Intelligence, 17, 1981, pp. 185–203. DOI: 10.1016/0004-3702(81)90024-2. Accessed 2026-05-27. The original dense optical flow paper; the brightness constancy plus global smoothness formulation; the foundation for every dense classical method.
-
Teed, Z., Deng, J. "RAFT: Recurrent All-Pairs Field Transforms for Optical Flow." ECCV 2020 Best Paper Award. arXiv:2003.12039. Accessed 2026-05-27. The RAFT paper; the all-pairs 4D correlation volume with multi-scale pooling at kernel sizes {1, 2, 4, 8}; the recurrent GRU update operator; Sintel final EPE 2.855 and KITTI F1-all 5.10%.
-
princeton-vl/RAFT. GitHub repository, reference implementation. github.com/princeton-vl/RAFT. Accessed 2026-05-27. BSD 3-Clause license; PyTorch implementation; pre-trained checkpoints; reference for 9 FPS at 1088×436 on a 1080 Ti and 20 FPS for the smaller variant.
-
Wang, Y., Wang, X., Li, J., et al. "SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow." 2024. arXiv:2405.14793. Accessed 2026-05-27. The simplified, faster, more accurate RAFT successor; 20+ FPS at 1080p on RTX 3090; the 2026 production default for dense deep optical flow.
-
Huang, Z., Shi, X., Zhang, C., et al. "FlowFormer: A Transformer Architecture for Optical Flow." ECCV 2022. arXiv:2203.16194. Accessed 2026-05-27. The transformer-based optical flow estimator; alternate-group transformer encoding of the 4D cost volume; Sintel clean 1.144 EPE and Sintel final 2.183 EPE.
-
Kroeger, T., Timofte, R., Dai, D., Van Gool, L. "Fast Optical Flow using Dense Inverse Search." ECCV 2016. arXiv:1603.03590. Accessed 2026-05-27. The DIS paper; three-stage algorithm with inverse search, patch aggregation, and variational refinement; 300–600 Hz on a single CPU core; the OpenCV-contrib
DISOpticalFlowimplementation. -
Farnebäck, G. "Two-Frame Motion Estimation Based on Polynomial Expansion." Scandinavian Conference on Image Analysis (SCIA), 2003. Accessed 2026-05-27. The Farnebäck dense optical flow algorithm; ships in OpenCV as
cv2.calcOpticalFlowFarneback; the middle-ground classical option between Horn-Schunck and DIS. -
Shi, J., Tomasi, C. "Good Features to Track." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1994. Accessed 2026-05-27. The Shi-Tomasi corner detector; the standard input to Lucas-Kanade tracking; ships in OpenCV as
cv2.goodFeaturesToTrack. -
OpenCV Documentation — Optical Flow. docs.opencv.org/4.x/d4/dee/tutorial_optical_flow.html. Accessed 2026-05-27. The canonical reference for Lucas-Kanade, Farnebäck, and DIS in production OpenCV; sample code and parameter recipes.
-
Butler, D. J., Wulff, J., Stanley, G. B., Black, M. J. "A Naturalistic Open Source Movie for Optical Flow Evaluation." ECCV 2012. sintel.is.tue.mpg.de. Accessed 2026-05-27. The MPI Sintel benchmark; synthetic dataset derived from the Blender open movie Sintel; the standard accuracy benchmark for optical flow models, used to report EPE numbers in this article.
-
Geiger, A., Lenz, P., Urtasun, R. "Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite." CVPR 2012. cvlibs.net/datasets/kitti. Accessed 2026-05-27. The KITTI 2015 optical flow benchmark; real-world driving scenes; the standard F1-all metric used to report 5.10% for RAFT.
-
PyTorch / Torchvision — raft_large. pytorch.org/vision/main/models/generated/torchvision.models.optical_flow.raft_large.html. Accessed 2026-05-27. The PyTorch torchvision reference implementation of RAFT with pre-trained weights; the canonical production-ready integration path.
-
NVIDIA Optical Flow SDK. developer.nvidia.com/opticalflow-sdk. Accessed 2026-05-27. NVIDIA's hardware-accelerated optical flow SDK using dedicated optical flow engine cores on Turing and newer GPUs; production-ready low-latency dense flow with deterministic compute cost.
-
Bitmovin Video Developer Report 2024–2025. bitmovin.com/video-developer-report. Accessed 2026-05-27. The annual survey of streaming engineering practice; reference for video stabilization and content-aware encoding adoption patterns that consume optical flow primitives.


