The paper shows LLM-as-a-judge is inconsistent, and proposes a probabilistic framework to fix it.
It shows 2 failures, a lower scored answer can win in head to head, and pairwise picks can form loops.
This matters so much because rankings, A/B tests, and reward modeling become…
This paper introduces a single model that can reason across biology, chemistry, and materials science in one place.
It was trained on 206B tokens that mix scientific text, raw sequences like DNA and proteins, and text-sequence pairs.
Then it was tuned on 40M instructions and…
I am glad to share the acceptance of 6 of my latest research papers at EMNLP 2025 (2 in the Main Track, 3 in the Industry Track, 1 in the NewSumm Workshop)
Congrats to all my co-authors at Dialpad, York University, UofA, and NTU.
#EMNLP2025
This chart shows how AI is expected to progress on biology benchmarks by 2030.
The first curve, PoseBusters-v2, is about predicting protein-ligand interactions. AI systems are already reaching high accuracy here, suggesting these tasks could be solved in just a few years.
The…
Excited to announce that our paper has been selected for a Best Paper Award at IEEE VIS 🏆
I would like to extend my gratitude to my co-authors, specifically to my supervisor Dr. @Enamul_Hoque . This achievement would not have been possible without their support.
#IEEEVIS2025
Excited to announce that our paper has been selected for a Best Paper Award at IEEE VIS 🏆
I would like to extend my gratitude to my co-authors, specifically to my supervisor Dr. @Enamul_Hoque . This achievement would not have been possible without their support.
#IEEEVIS2025
IMO, this is one of the best accomplishments we've ever pulled off at Dialpad. Not only do we have our own completely proprietary and 90%+ accurate customer satisfaction scoring AI model, but we now have explanations for those AI scores in detail and in aggregate. Now any company…
Evaluating Language Models for Biomedical Fact-Checking: A Benchmark Dataset for Cancer Variant Interpretation Verification
1. A new benchmark dataset called CIViC-Fact has been developed to evaluate the accuracy of language models in verifying cancer variant claims. This…
China's Alibaba just dropped a Python framework for building multi-agent apps.
AgentScope lets you build AI agents visually with MCP tools, memory, rag, and reasoning capabilities.
Works with any LLM and supports real-time steering.
100% Opensource.
Train AI Agents for complex real-world tasks in just a single line of Python Code.
Agent Reinforcement Trainer uses LLM-as-judge to train multi-step agents without manual rewards.
100% Opensource.
This Google engineer just released a 424-page free book on Agentic Design Patterns.
Covers advanced prompt engineering, multi-agent frameworks, RAG, agent tool use and MCP.
100% free with practical code examples.
What Large Language Models Know About Plant Molecular Biology
1. A new benchmark called MOBIPLANT has been introduced to evaluate the capabilities of large language models (LLMs) in plant molecular biology. This benchmark was developed by a consortium of 112 plant scientists…
What Large Language Models Know About Plant Molecular Biology
1. A new benchmark called MOBIPLANT has been introduced to evaluate the capabilities of large language models (LLMs) in plant molecular biology. This benchmark was developed by a consortium of 112 plant scientists…
Supervised learning in DNA neural networks @Nature
1. A groundbreaking study demonstrates that DNA molecules can autonomously perform supervised learning in vitro, a significant leap towards embedding learning capabilities in non-living systems. The research shows that DNA…
How to generate medical training data and rewards that make small models generalize.
A 7B model beats a 72B model by 19.7% on OmniMedVQA.
The model reads medical images and text together like a vision language system.
It creates its own image question answer tasks, then a…
🧬 Massive. Newly released Biomni-R0, a tiny 8B param biomedical AI model surpasses Claude 4 Sonnet and GPT-5, demonstrating the efficiency of domain-specialized training.
The model uses reinforcement learning to push a biomedical agent to expert level.
Biomni-R0 comes in 8B…
🧬 Massive. Newly released Biomni-R0, a tiny 8B param biomedical AI model surpasses Claude 4 Sonnet and GPT-5, demonstrating the efficiency of domain-specialized training.
The model uses reinforcement learning to push a biomedical agent to expert level.
Biomni-R0 comes in 8B… https://t.co/JQ0wBd43yM
🤖 Better LLM Agents for CRM Tasks: Tips and Tricks
CRM tasks are tough for LLMs - even GPT-4o only solves <30% of tasks in our CRMArenaPro benchmark 😬
📝 Blog: sforce.co/4600cWT
💡 Key finding: Showing agents HOW to solve tasks (not just WHAT to solve) dramatically…
543K Followers 24K FollowingThe best from ML/AI community | Ex-Microsoft, Rackspace, Fast Company | Wrote eight books about the future | Silicon Valley robots, holodecks, BCIs, & startups.
91 Followers 900 FollowingExploration over Exploitation.
RA @Mila_Quebec, Research Fellow @UniofOxford. MSc @UWindsor. Interested in Adversarial attacks, security & reliability of LLMs
710 Followers 3K FollowingCerebral tweets about GenAI, sprinkled with a fine layer of OU football shitposts // Maniacally curating good GenAI ideas here: https://t.co/FUoYiZkO3Q
427 Followers 524 FollowingPrincipal Applied Scientist @Oracle Health AI
ex - Sr. Applied Scientist @Amazon. 🇧🇩
Co-CTO @ReviewAcl.
Music (metal) and NLP research.
Opinions are my own.
97K Followers 8K FollowingCompiling in real-time, the race towards AGI.
The Largest Show on X for AI.
🗞️ Get my daily AI analysis newsletter to your email 👉 https://t.co/6LBxO8215l
4K Followers 24 FollowingThe European Chapter of the Association for Computational Linguistics
An annual Top-tier *ACL conference. #EACL2026 #NLProc
24-29 March 2026
2K Followers 886 FollowingAssociate professor @EmoryUniversity. Working on large language models, LLM inference, reasoning, natural language generation, and various aspects of GenAI.
91 Followers 900 FollowingExploration over Exploitation.
RA @Mila_Quebec, Research Fellow @UniofOxford. MSc @UWindsor. Interested in Adversarial attacks, security & reliability of LLMs
16K Followers 1K FollowingSenior Research Scientist - @google, Adjunct Faculty - @iitmadras, @iitbombay, Ex: @NICT_Publicity
Use of my tweets without permission ➡️ legal action
99 Followers 5 FollowingWe are a researcher community developing scientifically grounded research outputs and robust deployment infrastructure for broader impact evaluations.
13K Followers 2K FollowingSVP and Head of Biomedical AI @Xaira_Thera; Associate Prof @UofT; Chief AI Officer @UHN; former PHD, CS @Stanford; opinions my own. #AI #healthcare #biology
2K Followers 935 FollowingPh.D. student @LTIatCMU and intern at @AIatMeta (FAIR) working on (V)LM Evaluation & Systems that SeIf-Improve | Prev: @kaist_ai @yonsei_u
10K Followers 4K Followingsth new // ex Gemini RL+Inference @GoogleDeepMind // Chat AI @Meta // RL Agents @EA // ML+Information Theory @MIT+@Harvard+@GeorgiaTech // زن زندگی آزادی
4K Followers 473 FollowingFollow for AI in Digital Biology and Drug Discovery @NVIDIA, ex Insilico Medicine, ex Yale, PhD UMaryland, views are mine, DM for collabs
165K Followers 0 FollowingInvented principles of meta-learning (1987), GANs (1990), Transformers (1991), very deep learning (1991), etc. Our AI is used many billions of times every day.
22K Followers 540 FollowingFounded the Reasoning Team in Google Brain (now in the Gemini Core team of Google DeepMind). Build LLMs to reason. Opinions my own.
371 Followers 7 FollowingComputational Linguistics, established in 1974, is the official flagship journal of the Association for Computational Linguistics (ACL).
53K Followers 763 FollowingCEO & Founder, Chemify. Regius Professor, Scientist & Inventor. Fascinated & in a state of confusion & optimism. Trying to digitize chemistry & make alien life.
3K Followers 74 FollowingMachine Learning for Health (ML4H) • San Diego, 2025
#ml4h2025 • Contact: [email protected]
Follow us on Bluesky: https://t.co/c99yfcFHF4
57K Followers 858 FollowingFiguring out AI @allen_ai, open models, RLHF, fine-tuning, etc
Contact via email.
Writes @interconnectsai
Wrote The RLHF Book
Mountain runner