Speech-to-speech systems hear and reply in voice within one model, cutting the delay and information loss of chaining ASR, an LLM, and TTS. Realtime APIs and Gemini Live use this approach to make voice agents feel responsive and lifelike.