VRAM is the fast memory attached to a GPU. The model's weights and its temporary computations must fit inside it; exceed the limit and the model will not load. Quantization shrinks weights specifically to fit larger models into smaller VRAM.