GGUF packs a model's weights, metadata, and tokenizer into one file built for the llama.cpp ecosystem. It is the common way to ship compact, quantized models that run on laptops and edge boxes without a heavyweight ML framework.