CogVLM is a powerful open-source Vision Language Model (VLM).
An open-source vision-language (VL) model designed for real-world visual and language understanding applications.
The LLaVA-NeXT model aims to enhance reasoning capabilities, OCR, and world knowledge.
Qwen-VL is a large-scale Vision Language Model (Large Vision Language Model, LVLM) developed by Alibaba Cloud.