Qwen-VL is a capable, openly available multimodal model line that handles images, documents, and video, frequently chosen when teams want frontier-level open weights they can self-host and adapt.
Definition
A strong open vision-language model family from Alibaba, often near the top of open VLM benchmarks for image and video understanding.
Qwen-VL is a capable, openly available multimodal model line that handles images, documents, and video, frequently chosen when teams want frontier-level open weights they can self-host and adapt.
Also known as
Qwen2-VL, Qwen2.5-VL