A GPU server is a machine fitted with one or more graphics processing units, used in surveillance to run video analytics at scale. GPUs excel at the parallel maths that neural networks need, so a single GPU server can decode many camera streams and run detection, tracking, or recognition across them — far more than a CPU could, and far more than any single camera's NPU. It is the workhorse of the on-prem and cloud analytics tiers alike.
Its job is concentrated inference. Rather than putting modest AI on every camera, a GPU server centralises the heavy lifting: dozens of streams in, models running on the GPU, metadata and events out. Renting an equivalent GPU in the cloud does the same job for bursty or temporary workloads — and is dramatically cheaper than per-minute analytics APIs for continuous analysis, often tens of dollars a month versus thousands.
The pitfall, as with any GPU-based video tier, is that decoding — not inference — is frequently the real bottleneck. Each incoming stream must be decoded before the GPU can analyse it, and decode capacity (sometimes a separate hardware function) can cap camera density well below what the GPU's raw inference throughput suggests. Size a GPU server by decode-plus-inference for the specific models and resolutions in play, build in headroom and redundancy, and decide deliberately whether the GPUs live on-prem or in the cloud.

