OpenCompass focus on the evaluation and analysis of large language models and vision language models.
github: https://t.co/zF7ycuTXxsopencompass.org.cn/home ChinaJoined April 2024
🚀 Introducing #CompassVerifier: A unified and robust answer verifier for #LLMs evaluation and #RLVR!
✨LLM progress is bottlenecked by weak evaluation, looking for an alternative to rule-based verifiers? CompassVerifier can handle multiple domains including math, science, and…
🥳#CodeCriticBench assesses LLMs' critiquing ability in code generation and QA tasks. Covering 10 criteria, it features a 4.3k-samples dataset with three difficulty levels and balanced distribution.
😉CodeCriticBench is now part of the #CompassHub! 😚Feel free to download and…
🥳#StructFlowBench is a structurally annotated multi-turn benchmark that leverages a structure-driven generation paradigm to enhance the simulation of complex dialogue scenarios.
🥳StructFlowBench is now part of the #CompassHub! 😉Feel free to download and explore it—available…
😉#VBench is a comprehensive benchmark evaluates video generation quality. It comprises 16 dimensions in video generation, and also provides a dataset of human preference annotations.
🥳VBench is now part of the #CompassHub! Feel free to download and explore it—available for…
🥰VLM²-Bench is the first comprehensive benchmark that evaluates vision-language models' (#VLMs) ability to visually link matching cues across multi-image sequences and videos. The benchmark consists of 9 subtasks with over 3,000 test cases.
🥳VLM²-Bench is now part of the…
We've uploaded the AIME 2025 exam, complete with questions and solutions, here: huggingface.co/datasets/openc….
Feel free to test your powerful LLM on this dataset.
🌟 Exciting News!
CompassArena now back with some major updates:
- **Judge Copilot**: An LLM-as-a-Judge tool for model comparisons. 🤖
- **Enhanced Statistical Model**: Improved Bradley-Terry accuracy by addressing confounding variables. 📊
- **20+ New LLMs**: A global mix of…
OpenCompass has established a leaderboard to evaluate complex reasoning capability of LMMs, consisting of four advanced multi-modal math reasoning benchmarks. Currently, Gemini-2.0-Flash took the 1st place. DM me to suggest more benchmarks and models to this LB.
🚀 Shocking : O1-mini scores just 15.6% on AIME under strict, real-world metrics. 🚨
📈 Introducing G-Pass@k: A metric that reveals LLMs' performance consistency across trials.
🌐 LiveMathBench: Challenging LLMs with contemporary math problems, minimizing data leaks.
🔍 Our…
🚀 Excited to announce the release of CompassJudger-1, a powerful Judge LLM for diverse tasks! We've released 4 model sizes.
📷Submit your LLM's performance using CompassJudger to our leaderboard now!
📷Models: ompassJudger: github.com/open-compass/C…
📷Leaderboard:…
Congratulations to @Alibaba_Qwen on the release of so many new models! 🚀🚀🚀
OpenCompass now supports Qwen-2.5.
Stay tuned for more evaluation results, coming soon!📊📊📊
github.com/open-compass/o…
353 Followers 1K FollowingI post updates about the best LLMs! Here's my list; 🇨🇳 Qwen, DeepSeek, GLM, Kimi, StepFun, MiniMax, Hunyuan. 🇺🇸🇪🇺• GPT, Claude, Gemini, Grok, Mistral
297 Followers 847 Following@ELLISforEurope Ph.D. Student in Natural Language Processing at @CisLmu, supervised by @HinrichSchuetze and @andre_t_martins.
On the job market.
101 Followers 2K FollowingBuilding `pip install retrain-pipelines`, ML-Eng-centric OS DAG engine, WebConsole & transformers/diffusers retrain framework.
Wandering around. Mind if I do.
86 Followers 2K FollowingRAGE-KG (RAG Enabled by Knowledge Graphs) is an established academic workshop targeting synergies between #KGs, #RAG and #LLMs, coming to #ISWC 2025 🇯🇵.
638K Followers 35 FollowingWe're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.
50K Followers 3K FollowingDeveloper Experience Lead at @GoogleDeepMind
Building Gemini API, Gemma, AI Studio and more AI products. My views
ex-Chief Llama Officer @huggingface 🇵🇪🇲🇽
19K Followers 11 FollowingBot. I daily tweet progress towards machine learning and computer vision conference deadlines. Maintained by @chriswolfvision.
4.3M Followers 3 FollowingOpenAI’s mission is to ensure that artificial general intelligence benefits all of humanity. We’re hiring: https://t.co/dJGr6Lg202
2K Followers 510 FollowingInstitute of Data Science, The University of Hong Kong. Founder of FeelingAI. Looking for Interns/RAs/PhDs/Postdocs/Full-time researchers and engineers.
217 Followers 244 FollowingResearcher @ Shanghai AI Lab, working on multi-modal learning
B.S. @PKU1898 / Ph.D. @CUHKofficial
Built #VLMEvalKit for MLLM evaluation
365K Followers 6K FollowingChief Scientist, Google DeepMind & Google Research. Gemini Lead. Opinions stated here are my own, not those of Google. TensorFlow, MapReduce, Bigtable, ...
1.2M Followers 279 FollowingWe’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.
950K Followers 764 FollowingProfessor at NYU. Chief AI Scientist at Meta.
Researcher in AI, Machine Learning, Robotics, etc.
ACM Turing Award Laureate.
712K Followers 288 FollowingTogether with the AI community, we are pushing the boundaries of what’s possible through open science to create a more connected world.
1.4M Followers 1K FollowingBuilding @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
346K Followers 1K FollowingDeepMind Research Scientist. Opinions my own. Inventor of GANs. Lead author of https://t.co/M6vl8pEQ4I Founding chairman of @pubhealthaction
1.3M Followers 1K FollowingCo-Founder of Coursera; Stanford CS adjunct faculty. Former head of Baidu AI Group/Google Brain. #ai #machinelearning, #deeplearning #MOOCs
163K Followers 166 FollowingCo-founder of Thinking Machines Lab @thinkymachines; Ex-VP, AI Safety & robotics, applied research @OpenAI; Author of Lil'Log
5K Followers 536 FollowingAssistant professor @williamandmary Data Science, Ex Senior Researcher at @MSFTResearch. Machine learning, large language models, AI for social sciences.
92K Followers 207 FollowingLMArena: Open Platform for Community-driven AI Benchmarking. Graduated from UC Berkeley / @lmsysorg. We’re hiring: https://t.co/1OkfLq2n0I