RTP (Real-time Transport Protocol) is the protocol that actually carries the media — the video and audio packets — from camera to VMS. Standardised by the IETF in RFC 3550, it wraps each chunk of encoded video with a sequence number and a timestamp so the receiver can reassemble frames in order and play them at the right pace, even when packets arrive out of order. Its companion, RTCP, runs alongside to report on delivery quality (loss, jitter) so the session can adapt.
RTP is the "transport" half of the streaming pair: RTSP turns the stream on and controls it, RTP moves the bytes. It usually runs over UDP, trading guaranteed delivery for low latency — in live monitoring it is better to skip a lost packet and stay current than to stall waiting for a re-send. The timestamps RTP carries are also what let the VMS keep audio and video in sync and stamp recordings accurately.
The pitfall is again the network. Because RTP typically rides UDP, it is sensitive to firewalls, NAT, and congestion: lost packets show as momentary artefacts or frame drops, and blocked UDP shows as no video at all. Running RTP interleaved over the RTSP TCP connection is the common workaround when UDP cannot pass. The detailed transport mechanics belong to the Video Streaming section; in surveillance, RTP is simply how the live picture reaches the recorder and the wall.

