cogvlm-base-490-hf

CogVLM is a powerful open-source Vision Language Model (VLM), renowned for its remarkable performance across multiple cross-modal benchmark tests. The CogVLM-17B model combines 10 billion visual parameters with 7 billion language parameters, achieving state-of-the-art performance across 10 classic cross-modal benchmarks, including NoCaps, Flicker30k captioning, and GQA. Additionally, the model shows impressive results on VQAv2, OKVQA, and TextVQA, consistently ranking among the top contenders alongside other leading models like PaLI-X 55B. Users can delve into CogVLM’s multimodal conversational capabilities through an online demo, demonstrating its broad applicability in both academic research and practical applications. The release of CogVLM provides a robust tool for the open-source community, advancing both research and application development in the integration of vision and language.

cogvlm-base-490-hf

Information

Data

Categories & Tags