rasdani @rasdani_
~/.cache/huggingface Joined April 2022-
Tweets398
-
Followers469
-
Following3K
-
Likes2K
This is just a small vibecheck (more currently not possible due to rate limits) - but in the German Geo eval I built on stage yesterday evening, @Alibaba_Qwen 3-Max doesn't look competitive with other top models and also falls far behind e.g. R1 or GLM 4.5. 😕 @ellamindAI
Hey all! I'm back. I was quiet for a bit due to crunchtime, but it's finally done now. And... 🚨 It's official: elluminate is live! 🚀 As one of the OG LLM evaluation experts, I've joined ellamind to help build this. With our platform, anyone - not just engineers - can evaluate…
Hey all! I'm back. I was quiet for a bit due to crunchtime, but it's finally done now. And... 🚨 It's official: elluminate is live! 🚀 As one of the OG LLM evaluation experts, I've joined ellamind to help build this. With our platform, anyone - not just engineers - can evaluate…
Fuck it. Today, we open source FineVision: the finest curation of datasets for VLMs, over 200 sources! > 20% improvement across 10 benchmarks > 17M unique images > 10B answer tokens > New capabilities: GUI navigation, pointing, counting FineVision 10x’s open-source VLMs.
Twitter is like evals, evals, evals. And then when I talk to anyone at a big lab, it's like, "Ah, here are the 10 prompts I test. I just read them every couple of checkpoints."
@jxnlco or scale your vibe checks x.com/ellamindai/sta…
@jxnlco or scale your vibe checks x.com/ellamindai/sta… https://t.co/mLFxejXxeR
elluminate is the culmination of months of hard work and years of combined experience building real AI applications. Something that occurred to us early on is that evals are the only path to building reliable AI products. Let me explain how this works. 🧵
elluminate is the culmination of months of hard work and years of combined experience building real AI applications. Something that occurred to us early on is that evals are the only path to building reliable AI products. Let me explain how this works. 🧵
In elluminate, AI responses are rated using criteria posed as yes/no questions ✅/❌ Experts recommend this approach because it enables just enough freedom of expression while avoiding pitfalls like unclear grading scales, number bias in judges, and blurry decision boundaries.
Not only are binary criteria great for the humans working with the platform, but also judge models are much more reliable here than on other scales. Defining success criteria in this format feels like a restriction early on but leads to better insights and clearer definitions.
Experiments run your prompt with your selection of test cases using a specific model. With this simple setup, you can rapidly test and get quick insights on the most important numbers.
But what really counts is being able to look at the data to gain a real understanding for whats working and what not, and figure out how to improve.
This is a tool built to enable anybody that's building AI products, not only devs, but also product managers and domain experts to run evals, iterate on prompts, compare models, analyze weaknesses and make their AI more reliable. A platform built for evaluating real AI products.
Super happy to finally be able to share this and I'm looking forward to pushing my evals agenda more and more 😅
Visit our landing page at elluminate.de for more info.
AI evaluations are broken. Generic benchmarks tell you nothing. Manual QA doesn't scale. And existing tools are either too academic or simplify in the wrong places. That's why we built elluminate - evals that actually work for real product teams.
The result? Teams ship faster with confidence. Product managers can actually trust their metrics. And developers spend time building, not firefighting. Whether you're a developer tired of vibe-checking, a PM who needs reliable metrics, or a domain expert who knows what "good"…
Building AI products? You need real evaluations. Let's talk. elluminate.de
How can we benchmark Agents in realistic, complex environments? MCP-Universe is a new benchmark using Model Context Protocol (MCP) servers to test Agents on 231 challenging, practical tasks. Benchmark: 1️⃣ Tasks from 6 practical domains, Location Navigation, Repository…
Continuing the journey of optimal LLM-assisted coding experience. In particular, I find that instead of narrowing in on a perfect one thing my usage is increasingly diversifying across a few workflows that I "stitch up" the pros/cons of: Personally the bread & butter (~75%?) of…
the prospect of jump starting an open source ecosystem for RL is what drew me for sure to verifiers and @PrimeIntellect
the prospect of jump starting an open source ecosystem for RL is what drew me for sure to verifiers and @PrimeIntellect

Eetauawxor @Eetauawxor6771
1 Followers 285 Following
LindsayKatrine @f2dOcz8orH7LM
2 Followers 287 Following
Autin Mitra @AutinMitra
45 Followers 105 Following MTIA/pytorch @ meta, former MTS intern @ haize labs
ShortSqueezePro🇺�... @Blecui469
34 Followers 2K Following 15-30% Monthly | 2 High-Conviction Stocks.Short-Term Gains: 15-20% in Days/Weeks.DM "JOIN" for WhatsApp Alerts. Live Trade Signals • Market Analysis
Baxu @Baxu250607
6 Followers 389 Following
🎈 XGamesWannabe �... @XgamesWannabe
87 Followers 1K Following To be rather than to seem; All over the Great State Of North Carolina and beyond! $paymesoonsss
MavisSweet @WmBN343tU3ibc
0 Followers 289 Following
frenlyfren:) @frenlyfrenforu
30 Followers 615 Following gittin ziggy with it | big eeper | SIGSEGV enjoyer | I load my elf under 0x1000 and die 😎
無 @xwuxwux
1 Followers 4K Following
EvangelineHarry @08869F1mxP5z7
33 Followers 1K Following
Saber Darabi @SADarabi
303 Followers 7K Following
Coupons @coupons24601
34 Followers 525 Following
MignonHuggins @931U507Aes7lu9
14 Followers 720 Following The most alluring thing a woman can have is confidence.
Souptik Chakraborty @souptikchaks
10 Followers 261 Following Analyst by profession. Bad coder. Building AI Tools for interests. Trying to prove "hard work beats talent".
daryl martis @realdarylmartis
337 Followers 2K Following
iDoser @idoser
142K Followers 31K Following #iDoser Leads Mindfulness Music Industry 20+ Years with 10 Million Users. Top Producer of ASMR, Hypnosis, Binaural, Subliminal Music. https://t.co/ptqN4lVwyb
Bharat @bharatrunwal2
104 Followers 1K Following Pre-Training @IBMResearch | Prev: undergrad @iitdelhi, intern @Mila_Quebec, @MIT, @Cambridge_Uni, @HPI_DE , @michiganstateu
ved @TrivediVedant
413 Followers 6K Following
Nachman Kaul-Seidman @nachmanks331
42 Followers 2K Following
minh @eexwhyzee
386 Followers 932 Following I tweet about basketball most of the time. staff software engineer @capellaspace and @rt_hoops
Othmane @ThisIsOthmane
417 Followers 4K Following AI at Datadog | #DimaRaja #FreeKoulchi | Opinions are my own
opus131csharpminor @op131csharpminr
11 Followers 231 Following Recovering mortgage-backed securities trader, lifelong programmer, longtime EA/AI safety adjacent, Beethoven and Mahler appreciator
Bartu @bartukoksal418
105 Followers 1K Following uiuc cs; rc'22; Fenerbahceli; "keske golgesine razi bir feslegen olaydim"
Rafael Bodero @NaufragoRB
136 Followers 6K Following
Théo Pomies @theopomies
290 Followers 2K Following CTO @ https://t.co/BRhXDz6tjE, Roman Catholic, Bitcoin Maximalist, AI dabbler, Weight training x Training weights
Omair Shahid @OmairShahid
648 Followers 3K Following Product of progressive public policy; raised by public libraries and public education that produced a passion for politics. and apparently alliteration
The Tower @TheWhiteTower16
67 Followers 569 Following
Freddie Vargus @freddie_v4
1K Followers 2K Following cto & co-founder @quotientai Research @cohere_labs — past: evals @github Copilot, data @quantopian — Tico 🇨🇷 & Bostonian 🇺🇸
❄️Andrew Zhao❄�... @_AndrewZhao
4K Followers 3K Following PhD @Tsinghua_Uni. Absolute Zero,ExpeL,Diver-CT Research Intern @MSFTResearch, Ex. @ BIGAI. Interested in RL, Reasoning/Safety 4 LLMs, Agents. On job market 26'
Luigi Pagani @Luigi1549898
38 Followers 898 Following
Omar Nusrat @OmarNusrat
938 Followers 4K Following phd candidate @torontomet medical physics • prev @openai @ontariotech_u • co-chair @comptrainees
Infornomics @infornomics
1K Followers 1K Following Data Science, AI & Energy Transition, RSWE economics/ Logic-Weaver, Data-Mancer, Charmer of machines, Wonder-Walker (bestowed by Claude) Goal: emotion spinner
Ahmad @ahmad_7432
2 Followers 477 Following
Muhammad Qasim Khan @mqasim628_khan
57 Followers 2K Following
Damian Barabonkov @TheBitFlipper
10 Followers 114 Following
Zeyuan Allen-Zhu, Sc.... @ZeyuanAllenZhu
20K Followers 452 Following physics of language models @ Meta (FAIR, not GenAI) 🎓:Tsinghua Physics — MIT CSAIL — Princeton/IAS 🏅:IOI x 2 — ACM-ICPC — USACO — Codejam — math MCM
Stas Bekman @StasBekman
9K Followers 286 Following Toolmaker. Software creator, optimizer and harmonizer. Makes ML systems work and fly @ Snowflake.
Michael Luo @michaelzluo
540 Followers 213 Following Project Lead @Agentica_ | Prev. Researcher @GoogleDeepMind | PhD at UC Berkeley @berkeley_ai
Rasmus 🍒 @synquid
133 Followers 175 Following AI & LLMs @ Alexandra Institute. Danish Foundation Models. Enjoyer of RL.
Brandon McKinzie @mckbrando
7K Followers 2K Following research @OpenAI | prev: multimodal @Apple, physics/cs @UCBerkeley
Quentin @qtnx_
17K Followers 540 Following i use nixos btw, research engineer @mistralai, prev. https://t.co/SDROdHKqTQ, husband, all opinions my own
DailyPapers @HuggingPapers
5K Followers 3 Following Tweeting interesting papers submitted at https://t.co/rXX8x0HzXV. Submit your own at https://t.co/QhbJKXBd4Q, and link models/datasets/demos to it!
zed @zmkzmkz
4K Followers 1K Following #1 paperclip maximizer fan, occasionally on x-games mode. I really, really like watching loss graphs go down
Alex Dimakis @AlexGDimakis
21K Followers 2K Following Professor, UC berkeley | Founder @bespokelabsai |
Chen Sun 🤖🧠🇨... @ChenSun92
2K Followers 397 Following Research Scientist @ Google DeepMind Building memory & open-ended AI ex-neuroscientist ex-IMO team Canada Views are mine alone not GDM's.
Tim Rocktäschel @_rockt
39K Followers 2K Following Director and Open-Endedness Team Lead @GoogleDeepMind, Professor of AI @AI_UCL, PI @UCL_DARK, Fellow @ELLISforEurope.
Physical Intelligence @physical_int
24K Followers 30 Following Physical Intelligence (Pi), bringing AI into the physical world.
Tibor Blaho @btibor91
31K Followers 2K Following Lead Engineer at @AIPRMcorp (https://t.co/fepyWfV4kA) and @lrt_co (https://t.co/p7LEvIKduG), building AIPRM for ChatGPT & Claude. Signal @ btibor.91
All Hands AI @allhands_ai
8K Followers 11 Following We build AI software development agents, in the open. Developing OpenHands: https://t.co/wDOBeXGLmO
leloy! @leloykun
6K Followers 4K Following Math @ AdMU • NanoGPT speedrunner • Muon fan 🤍 • prev ML @ XPD • 2x IOI & 2x ICPC • https://t.co/nfO038itfn
Saurabh Shah @saurabh_shah2
2K Followers 1K Following training olmos @allen_ai prev @Apple @Penn 🎤dabbler of things🎸 🐈⬛enjoyer of cats 🐈 and mountains🏔️he/him
Mechanize @MechanizeWork
6K Followers 1 Following We're a software company building RL environments to power the full automation of the economy.
Z.ai @Zai_org
15K Followers 142 Following The AI lab behind GLM models, dedicated to inspiring the development of AGI to benefit humanity. https://t.co/b6zGxJvzzS
Casper Hansen @casper_hansen_
10K Followers 457 Following NLP Scientist | AutoAWQ Creator | Open-Source Contributor
DeepSeek @deepseek_ai
973K Followers 0 Following Unravel the mystery of AGI with curiosity. Answer the essential question with long-termism.
Lisan al Gaib @scaling01
21K Followers 655 Following lead them to paradise | intelligence is inherently about scaling | be kind to us AGI
Flo @flozi00
54 Followers 59 Following
Kaichao You @KaichaoYou
4K Followers 134 Following phd student in tsinghua university, working on @vllm_project
Dieter @kagglingdieter
16K Followers 145 Following deep learner @nvidia, PhD mathematics, current rank @kaggle: 1
Vincent Abbott @vtabbott_
7K Followers 335 Following Maker of *those* diagrams for deep learning algorithms | @mit @mitlids incoming PhD
Dominique Paul @DominiqueCAPaul
3K Followers 1K Following Statistics at @ETH. Published at @NeurIPSconf. Now building ML models for robots. DMs open for anything robotics.
Jonas Gehring @jnsgehring
557 Followers 194 Following PhD student at FAIR and @eth_en. He/him. https://t.co/F9lEccvESH
Quentin Gallouédec @QGallouedec
3K Followers 663 Following PhD - Research @huggingface 🤗 TRL lead maintainer 🇫🇷 in 🇨🇦
Arthur Douillard @Ar_Douillard
8K Followers 2K Following Distributed Learning @ deepmind | DiLoCo, DiPaCo. Continual Learning PhD @ Sorbonne
Mario Sieg @_mario_neo_
3K Followers 110 Following ML | game engines | compilers | HPC Research Engineer @PrimeIntellect Building my custom pytorch: https://t.co/5PnTE712WQ
Grad @Grad62304977
4K Followers 2K Following