OpenCompass focus on the evaluation and analysis of large language models and vision language models.
github: https://t.co/zF7ycuTXxsopencompass.org.cn/home ChinaJoined April 2024
🚀 Introducing #CompassVerifier: A unified and robust answer verifier for #LLMs evaluation and #RLVR!
✨LLM progress is bottlenecked by weak evaluation, looking for an alternative to rule-based verifiers? CompassVerifier can handle multiple domains including math, science, and…
🥳#CodeCriticBench assesses LLMs' critiquing ability in code generation and QA tasks. Covering 10 criteria, it features a 4.3k-samples dataset with three difficulty levels and balanced distribution.
😉CodeCriticBench is now part of the #CompassHub! 😚Feel free to download and…
🥳#StructFlowBench is a structurally annotated multi-turn benchmark that leverages a structure-driven generation paradigm to enhance the simulation of complex dialogue scenarios.
🥳StructFlowBench is now part of the #CompassHub! 😉Feel free to download and explore it—available…
😉#VBench is a comprehensive benchmark evaluates video generation quality. It comprises 16 dimensions in video generation, and also provides a dataset of human preference annotations.
🥳VBench is now part of the #CompassHub! Feel free to download and explore it—available for…
🥰VLM²-Bench is the first comprehensive benchmark that evaluates vision-language models' (#VLMs) ability to visually link matching cues across multi-image sequences and videos. The benchmark consists of 9 subtasks with over 3,000 test cases.
🥳VLM²-Bench is now part of the…
We've uploaded the AIME 2025 exam, complete with questions and solutions, here: huggingface.co/datasets/openc….
Feel free to test your powerful LLM on this dataset.
🌟 Exciting News!
CompassArena now back with some major updates:
- **Judge Copilot**: An LLM-as-a-Judge tool for model comparisons. 🤖
- **Enhanced Statistical Model**: Improved Bradley-Terry accuracy by addressing confounding variables. 📊
- **20+ New LLMs**: A global mix of…
OpenCompass has established a leaderboard to evaluate complex reasoning capability of LMMs, consisting of four advanced multi-modal math reasoning benchmarks. Currently, Gemini-2.0-Flash took the 1st place. DM me to suggest more benchmarks and models to this LB.
🚀 Shocking : O1-mini scores just 15.6% on AIME under strict, real-world metrics. 🚨
📈 Introducing G-Pass@k: A metric that reveals LLMs' performance consistency across trials.
🌐 LiveMathBench: Challenging LLMs with contemporary math problems, minimizing data leaks.
🔍 Our…
🚀 Excited to announce the release of CompassJudger-1, a powerful Judge LLM for diverse tasks! We've released 4 model sizes.
📷Submit your LLM's performance using CompassJudger to our leaderboard now!
📷Models: ompassJudger: github.com/open-compass/C…
📷Leaderboard:…
Congratulations to @Alibaba_Qwen on the release of so many new models! 🚀🚀🚀
OpenCompass now supports Qwen-2.5.
Stay tuned for more evaluation results, coming soon!📊📊📊
github.com/open-compass/o…
346 Followers 1K FollowingI post updates about the best LLMs! Here's my list; 🇨🇳 Qwen, DeepSeek, GLM, Kimi, StepFun, MiniMax, Hunyuan. 🇺🇸🇪🇺• GPT, Claude, Gemini, Grok, Mistral
299 Followers 838 Following@ELLISforEurope Ph.D. Student in Natural Language Processing at @CisLmu, supervised by @HinrichSchuetze and @andre_t_martins.
On the job market.
106 Followers 2K FollowingBuilding `pip install retrain-pipelines`, ML-Eng-centric OS DAG engine, WebConsole & transformers/diffusers retrain framework.
Wandering around. Mind if I do.
91 Followers 2K FollowingRAGE-KG (RAG Enabled by Knowledge Graphs) is an established academic workshop targeting synergies between #KGs, #RAG and #LLMs, coming to #ISWC 2025 🇯🇵.
645K Followers 35 FollowingWe're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.
51K Followers 3K FollowingDeveloper Experience Lead at @GoogleDeepMind
Building Gemini API, Gemma, AI Studio and more AI products. My views
ex-Chief Llama Officer @huggingface 🇵🇪🇲🇽
19K Followers 11 FollowingBot. I daily tweet progress towards machine learning and computer vision conference deadlines. Maintained by @chriswolfvision.
4.4M Followers 3 FollowingOpenAI’s mission is to ensure that artificial general intelligence benefits all of humanity. We’re hiring: https://t.co/dJGr6Lg202
2K Followers 510 FollowingInstitute of Data Science, The University of Hong Kong. Founder of FeelingAI. Looking for Interns/RAs/PhDs/Postdocs/Full-time researchers and engineers.
218 Followers 244 FollowingResearcher @ Shanghai AI Lab, working on multi-modal learning
B.S. @PKU1898 / Ph.D. @CUHKofficial
Built #VLMEvalKit for MLLM evaluation
97K Followers 1K FollowingBuilding @devv_ai — the AI product builder with native integrations (LLM, Auth, DB & more). Helping anyone go from idea → app. Ex-@tiktok_us
367K Followers 6K FollowingChief Scientist, Google DeepMind & Google Research. Gemini Lead. Opinions stated here are my own, not those of Google. TensorFlow, MapReduce, Bigtable, ...
1.2M Followers 279 FollowingWe’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.
955K Followers 765 FollowingProfessor at NYU. Chief AI Scientist at Meta.
Researcher in AI, Machine Learning, Robotics, etc.
ACM Turing Award Laureate.
716K Followers 288 FollowingTogether with the AI community, we are pushing the boundaries of what’s possible through open science to create a more connected world.
1.4M Followers 1K FollowingBuilding @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
347K Followers 1K FollowingDeepMind Research Scientist. Opinions my own. Inventor of GANs. Lead author of https://t.co/M6vl8pEQ4I Founding chairman of @pubhealthaction
1.3M Followers 1K FollowingCo-Founder of Coursera; Stanford CS adjunct faculty. Former head of Baidu AI Group/Google Brain. #ai #machinelearning, #deeplearning #MOOCs
166K Followers 166 FollowingCo-founder of Thinking Machines Lab @thinkymachines; Ex-VP, AI Safety & robotics, applied research @OpenAI; Author of Lil'Log
5K Followers 535 FollowingAssistant Professor @williamandmary, Ex Senior Researcher @MSFTResearch. Generative AI, machine learning, large language models, AI for social sciences.
95K Followers 207 FollowingLMArena: Open Platform for Community-driven AI Benchmarking. Graduated from UC Berkeley / @lmsysorg. We’re hiring: https://t.co/1OkfLq2n0I