TensorRT rewrites and fuses a model's operations and applies lower-precision math to run much faster on NVIDIA hardware. Exporting to TensorRT is a standard last step for squeezing real-time speed out of detectors and other video models.
Definition
NVIDIA's optimiser that compiles a model into a highly tuned form for its GPUs, cutting latency and boosting throughput at inference time.
TensorRT rewrites and fuses a model's operations and applies lower-precision math to run much faster on NVIDIA hardware. Exporting to TensorRT is a standard last step for squeezing real-time speed out of detectors and other video models.
Also known as
TensorRT-LLM