Cloud inference sends video or audio to a provider's servers, runs the model on powerful shared hardware, and returns the result. It removes the burden of owning GPUs but introduces network latency and a recurring bill that grows with usage.