A diffusion model learns to reverse a noising process, turning random static into a coherent image or clip over many denoising steps. It is the engine behind most text-to-image and text-to-video systems; step count trades quality against speed and cost.