Triton serves models from different frameworks together with dynamic batching and multi-model scheduling, so vision, audio, and language models share GPUs efficiently. It is a frequent backbone for production video-AI pipelines.
Definition
NVIDIA's open server for deploying many models behind one fast, standard API, handling batching, scheduling, and GPU sharing across model types.
Triton serves models from different frameworks together with dynamic batching and multi-model scheduling, so vision, audio, and language models share GPUs efficiently. It is a frequent backbone for production video-AI pipelines.
Also known as
Triton, NVIDIA Triton