GGUF packs a model's weights, metadata, and tokenizer into one file built for the llama.cpp ecosystem. It is the common way to ship compact, quantized models that run on laptops and edge boxes without a heavyweight ML framework.
Definition
A single-file format for distributing quantized language and multimodal models, popular for running them efficiently on CPUs and consumer GPUs.
GGUF packs a model's weights, metadata, and tokenizer into one file built for the llama.cpp ecosystem. It is the common way to ship compact, quantized models that run on laptops and edge boxes without a heavyweight ML framework.
Also known as
GGML successor