OpenGVLab @opengvlab

Shanghai AI Lab, General Vision Team. We created InternImage, BEVFormer, VideoMAE, LLaMA-Adapter, Ask-Anything, and many more! [email protected] github.com/OpenGVLab Shanghai Joined January 2023

Tweets

147
Followers

2K
Following

88
Likes

78

DailyPapers @HuggingPapers

a week ago

Top AI Papers of The Week (September 15-21) - OmniWorld: Multi-Domain 4D World Modeling - ScaleCUA: Scaling Cross-Platform Agents by @opengvlab - WebWeaver: Dynamic Outlines for Deep Research - Scaling Agents via Continual Pre-training - FlowRL: Matching Reward Distributions for…

4 7 61 10K 17

Download Image

OpenGVLab @opengvlab

a week ago

HF Papers: huggingface.co/papers/2509.15… GitHub: github.com/OpenGVLab/Scal… arXiv: arxiv.org/abs/2509.15221 🚀 ScaleCUA is the first open-source 🖥️📱 framework and dataset for truly cross-platform Computer Use Agents, spanning Windows, macOS, Linux, Android, iOS, and Web. It unifies GUI…

0 5 16 1K 3

Download Video

OpenGVLab @opengvlab

a week ago

Wellcom to follow our new work! x.com/opengvlab/stat…

DailyPapers @HuggingPapers

2 weeks ago

Wellcom to follow our new work! x.com/opengvlab/stat…

2 5 10 2K 2

0 0 1 320 0

DailyPapers @HuggingPapers

a week ago

ScaleCUA: Master GUIs across 6 OS with our new open-source agent! This VLM-powered agent sets new SOTA, trained on a massive dataset of 6 OS and 3 task domains, enabling seamless cross-platform operation. Researchers, explore its capabilities!

3 7 26 2K 12

Download Image

Adina Yakup @AdinaYakup

a week ago

ScaleCUA 🔥 computer-use agents with cross-platform data, released by @opengvlab Paper: huggingface.co/papers/2509.15… Model: huggingface.co/collections/Op… ✨ 3B/7B/72B - Apache2.0 ✨ Two modes: Direct Action & Reasoned Action (agent) ✨ Trained on 6 OS + 3 domains with a dual-loop…

1 10 63 2K 25

Adina Yakup @AdinaYakup

a week ago

@opengvlab

1 1 5 864 1

Download Image

OpenGVLab @opengvlab

6 months ago

🥳We have released #InternVL3, an advanced #MLLM series ranging from 1B to 78B, on @huggingface. 😉InternVL3-78B achieves a score of 72.2 on the MMMU benchmark, setting a new SOTA among open-source MLLMs. ☺️Highlights: - Native multimodal pre-training: Simultaneous language and…

3 49 167 12K 78

Download Image

OpenGVLab @opengvlab

7 months ago

🚀 Introducing MM-Eureka Series - A Breakthrough in Multimodal Reasoning with Visual Aha Moments! ✨ Reproduced R1-Zero and Visual Aha-Moment Phenomena 🧠 Trained on only 0.05% of the data used for base models, it achieves comparable benchmark math reasoning performance to…

4 21 68 5K 34

Download Image

OpenGVLab @opengvlab

8 months ago

🚀 Introducing #InternVideo 2.5 - The Video Multimodal AI That Sees Longer & Smarter! ✨ Handles videos 6x longer than predecessors ✨ Pinpoints objects/actions with surgical precision ✨ Trained on 300K+ hours of diverse video data 📈 Outperforms SOTA on multiple benchmarks &…

2 11 75 5K 36

Download Image

OpenGVLab @opengvlab

9 months ago

🥳Mini-InternVL has been accepted by Visual Intelligence! The Mini-InternVL series of #MLLMs, with parameter ranges from 1 B to 4 B, achieve 90% of the performance using only 5% of the parameters. This significant efficiency and performance boost makes our model more accessible…

2 6 32 2K 7

Download Image

OpenGVLab @opengvlab

9 months ago

People pay more and more attention on the quality or details of generated videos. Using a single hand-tuning temperature parameter to enhance your generated video for free! Nice work with our amazing friends @YangL_7 @oahzxl, @shaowenqi126301, @VictorKaiWang1, @VITAGroupUT,…

Yang Luo @YangL_7

9 months ago

9 86 253 46K 175

Download Video

0 2 22 2K 4

OpenGVLab @opengvlab

10 months ago

We have reached a milestone by exceeding human performance on the R2R dataset in vision-language navigation for the very first time.

Zun Wang @ZunWang919

10 months ago

We have reached a milestone by exceeding human performance on the R2R dataset in vision-language navigation for the very first time.

1 33 93 35K 38

Download Image

0 4 20 2K 0

OpenGVLab @opengvlab

10 months ago

🥳We have released InternVL2.5, ranging from 1B to 78B, on @huggingface . 😉InternVL2_5-78B is the first open-source #MLLM to achieve over 70% on the MMMU benchmark, matching the performance of leading closed-source commercial models like GPT-4o. 🤗HF Space:…

8 50 183 22K 55

Download Image

OpenGVLab @opengvlab

10 months ago

The tech report is worth reading. It reveals many details about how InternVL 1.5, InternVL 2.0, and now InternVL 2.5 can be the best open-source #vlm foundation model all the time. huggingface.co/papers/2412.05…

Wenhai Wang @wangwenhai362

10 months ago

0 3 23 2K 2

Download Image

1 2 16 2K 0

OpenGVLab @opengvlab

11 months ago

Here comes the Mini-InternVL 2.0 ! 🚀With just 5% of the parameters, it delivers 90% performance! arxiv👏: arxiv.org/abs/2410.16261 repos👉: github.com/OpenGVLab/Inte… 1B version🤗: huggingface.co/OpenGVLab/Inte… 2B version🤗: huggingface.co/OpenGVLab/Inte… 4B version🤗:…