An edge server is an on-premises machine that sits close to the cameras and runs analytics for many of them at once — the middle tier between AI on the camera itself and AI in the cloud. It is typically a GPU-equipped server in the same building or campus, pulling streams from dozens of cameras, decoding them, and running detection, tracking, or recognition models that are too heavy for a camera's NPU but that you want to keep local.
This tier earns its place on latency, bandwidth, and data residency. Because it is on site, results come back fast and raw video never leaves the premises, which matters for privacy, for data-residency rules, and for avoiding cloud egress costs. One server can serve many cameras, so it concentrates the AI investment where a per-camera NPU would be underpowered and the cloud would be slow or expensive.
The pitfall is the decode-first ceiling. Before a GPU can run inference on a stream it must decode the video, and decoding — not the AI maths — is often what caps how many cameras a server can handle; sizing by "GPU TOPS" while ignoring decode capacity leads to a server that stalls well below its claimed camera count. Plan stream density against decode throughput, and treat the edge server as one tier in a deliberate edge/server/cloud split rather than a place to dump all analytics.

