Traditional detectors only know the classes they were trained on. Open-vocabulary detection accepts a text prompt and locates whatever it describes, fusing a vision model with a language model. It is the practical way to ship detection for a long, changing list of objects.

