Temporal pooling is the step that combines a metric's per-frame scores along the time axis into the single number for a clip, as opposed to spatial pooling, which combines per-pixel scores inside one frame. Once each frame has a single score, temporal pooling decides how those scores along time are summarized. The default is the arithmetic mean, but it smooths over brief bad stretches, so a one-second collapse vanishes into a sea of good frames. Because research shows people weight the worst, transient moments far more than the average, worst-case temporal rules track human opinion better: a low percentile such as perc5 or perc10 reports how bad it gets for a noticeable stretch, while the minimum or a windowed minimum flags the single worst frame or worst short window. A related human effect, recency, means a drop near the end of a long session costs more, which standardized streaming models such as ITU-T P.1203 build into their temporal integration. Temporal pooling sits beside spatial pooling and shares the same averaging trap one level up.