Pensieve (Mao, Netravali, Alizadeh — ACM SIGCOMM 2017) was the first widely-cited demonstration of reinforcement-learning-based ABR. The model takes recent throughput, current buffer level, last chosen rendition, and remaining content as input, and outputs a probability distribution over the next rendition. Training uses A3C (Asynchronous Advantage Actor-Critic) on simulated network traces drawn from real cellular and broadband measurements. The trained model outperformed BOLA and MPC on the tested traces.
Pensieve mattered as a proof-of-concept: it showed that an ABR algorithm could be learned end-to-end without explicit modeling assumptions, and could outperform careful hand-tuned heuristics. Several followup papers extended the approach to per-content models, edge-side ABR decisions, and federated training. Production deployment is rare — most operators prefer the interpretability and predictability of heuristic ABRs.
The legacy of Pensieve is broader than its direct use. The reinforcement-learning approach influenced thinking about ABR at every major streaming team. Bitmovin, THEO, and several smaller player vendors offer "ML-tuned" ABR options that draw on Pensieve-style training. Netflix's per-title ABR — which adjusts the ABR strategy per piece of content — has Pensieve as an intellectual antecedent.

