VRAM

Definition

The memory on a GPU. A model only runs if its weights plus working data fit in VRAM, so VRAM size caps which models a given GPU can serve.

VRAM is the fast memory attached to a GPU. The model's weights and its temporary computations must fit inside it; exceed the limit and the model will not load. Quantization shrinks weights specifically to fit larger models into smaller VRAM.

Also known as

GPU memory

Specialist software house for video, real-time and AI products. Founded 2005. 50 in-house engineers.

Knowledge base

Blog Guides Courses Glossary Downloads

Company

Services Projects Demos Calculator Contacts

+852-8193-2621

Hong Kong

+1 (914) 775-5855

New York · USA

eager2develop@forasoft.com

Your message has been sent successfully

We will contact you soon

Message not sent. Please try again.

VRAM

Related terms

GPU

Quantization

KV cache