Cloud inference sends video or audio to a provider's servers, runs the model on powerful shared hardware, and returns the result. It removes the burden of owning GPUs but introduces network latency and a recurring bill that grows with usage.
Definition
Running models in a remote data centre reached over the network. Scales easily and uses big GPUs, but adds round-trip delay and per-call cost.
Cloud inference sends video or audio to a provider's servers, runs the model on powerful shared hardware, and returns the result. It removes the burden of owning GPUs but introduces network latency and a recurring bill that grows with usage.
Also known as
server-side inference