TensorRT

Definition

NVIDIA's optimiser that compiles a model into a highly tuned form for its GPUs, cutting latency and boosting throughput at inference time.

TensorRT rewrites and fuses a model's operations and applies lower-precision math to run much faster on NVIDIA hardware. Exporting to TensorRT is a standard last step for squeezing real-time speed out of detectors and other video models.

Also known as

TensorRT-LLM

Specialist software house for video, real-time and AI products. Founded 2005. 50 in-house engineers.

Knowledge base

Blog Guides Courses Glossary Downloads

Company

Services Projects Demos Calculator Contacts

+852-8193-2621

Hong Kong

+1 (914) 775-5855

New York · USA

eager2develop@forasoft.com

Your message has been sent successfully

We will contact you soon

Message not sent. Please try again.

TensorRT

Related terms

ONNX

Quantization

Jetson

YOLO