Latency is the delay between something happening in front of a camera and the system reacting to it — the frame appearing on the wall, or an alert firing. In surveillance it is the sum of several legs: the camera capturing and encoding, the network carrying the stream, the analytics decoding and running inference, and the result reaching an operator or triggering an action. Each leg adds milliseconds, and where the analytics runs dominates the total.
Latency is the performance counterpart to cost in the edge-vs-cloud decision. Analysing on the camera or a local server keeps the total low — roughly 25–100 ms — because there is no long network round-trip; sending the frame to the cloud and back commonly adds up to several hundred milliseconds, often 300–800 ms. The decisive insight is that it is usually the network round-trip, not the AI computation, that makes cloud analytics slow.
The pitfall is ignoring latency until it bites a real use case. A door that must unlock the instant an authorised plate is read, or an intervention that has to happen before an intruder moves on, needs sub-second response and therefore edge or hybrid processing — a cloud round-trip would arrive too late. Match the processing tier to the task's latency tolerance: sub-200 ms work belongs at the edge, second-scale work can be hybrid, and only latency-insensitive analysis belongs purely in the cloud.

