Triton Inference Server

Definition

NVIDIA's open server for deploying many models behind one fast, standard API, handling batching, scheduling, and GPU sharing across model types.

Triton serves models from different frameworks together with dynamic batching and multi-model scheduling, so vision, audio, and language models share GPUs efficiently. It is a frequent backbone for production video-AI pipelines.

Also known as

Triton, NVIDIA Triton

Specialist software house for video, real-time and AI products. Founded 2005. 50 in-house engineers.

Knowledge base

Blog Guides Courses Glossary Downloads

Company

Services Projects Demos Calculator Contacts

+852-8193-2621

Hong Kong

+1 (914) 775-5855

New York · USA

eager2develop@forasoft.com

Your message has been sent successfully

We will contact you soon

Message not sent. Please try again.

Triton Inference Server

Related terms

vLLM

TensorRT

Throughput