NPU (Neural Processing Unit)

An NPU (Neural Processing Unit) is a specialised chip — or a block inside a camera's main system-on-chip — built to run neural-network inference efficiently. Unlike a general CPU, it is optimised for the matrix maths that AI models use, so it can detect people or vehicles in a video frame while drawing little power, which is exactly what fits inside a camera. NPU performance is usually quoted in TOPS (trillions of operations per second), with camera-class parts commonly offering a few TOPS.

The NPU is what makes edge AI possible. It lets a camera run a detection or classification model on its own video in real time and emit results as metadata, instead of shipping frames to a server. Because it is power- and silicon-constrained, models are typically compressed (quantised to 8-bit integers) to fit, trading a little accuracy for a large gain in speed and efficiency.

The pitfall is reading the TOPS number as the whole story. Raw TOPS does not tell you which models fit, at what resolution and frame rate, or how much accuracy the quantisation cost — a headline figure can mask a chip that only runs a small model at low resolution. The practical questions are which specific models the NPU runs well and what accuracy they hold after compression, not the marketing TOPS alone. The model and silicon internals belong to the AI for Video Engineering section.

NPU (Neural Processing Unit)

Related terms

Edge AI

Inference