KV cache

Definition

The stored attention values a language model reuses while generating, so it need not recompute the whole context for each new token. Big memory consumer.

The key-value cache holds intermediate attention results for tokens already processed, letting generation continue cheaply token by token. It speeds output but grows with context length and concurrency, making it a main driver of serving memory and a target for optimisation like PagedAttention.

Also known as

key-value cache