Co-Founder @FreeDoctr 🧬⚕️ ,
Building a collaborative Platform for Patients and Doctors. 🌐🩻
#GraphML, #GeometricDL, #Gen AI #ML ,#RL, #LLM, #AIForHealthcarelinkedin.com/in/manish-gena…Joined August 2021
🚀 New Survey Alert!
📄 The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
By 16 top institutions (Oxford, NUS, UIUC, UCL, and more)
We explore how LLMs evolve from passive text generators → proactive agents with planning, memory, tool use, reasoning & beyond.…
This is a new 100-page RL for LLM literature review. It appears fairly complete. It also covers static/dynamic data and frameworks. And it has some nice figures!
🔗arxiv.org/abs/2509.08827
Another great @GoogleDeepMind paper.
Shows how to speed up LLM agents while cutting cost and keeping answers unchanged.
30% lower total cost and 60% less wasted cost at comparable acceleration.
Agents plan step by step, so each call waits for the previous one, which drags…
Biomni-R0: New Agentic LLMs Trained End-to-End with Multi-Turn Reinforcement Learning for Expert-Level Intelligence in Biomedical Research
Researchers from Stanford University and UC Berkeley introduced a new family of models called Biomni-R0, built by applying reinforcement…
Compute is not a big deal for LLMs now, but memory is.
It's an idea used in XQuant – a new method by @UCBerkeley created to reduce memory use up to 12x.
- XQuant doesn't store usual KV cache
- It quantizes and stores only X - the layer input activations
- When needed, it…
A 14B model just beat a 671B model on math reasoning.
Here’s how Microsoft’s rStar2-Agent achieves frontier math performance in 1 week of RL training
- by “thinking smarter, not longer.” 🧵
PAN (Physical, Agentic, and Nested) - a very interesting version of world models, based on the new building principles for such models.
It's based on a complex mountaineering scenario and uses multimodal inputs: sights, sounds, sensations, body strain, temperature, text, etc.…
🔍 How do we teach an LLM to 𝘮𝘢𝘴𝘵𝘦𝘳 a body of knowledge?
In new work with @AIatMeta, we propose Active Reading 📙: a way for models to teach themselves new things by self-studying their training data. Results:
* 𝟔𝟔% on SimpleQA w/ an 8B model by studying the wikipedia…
What makes the HRM model work so well for its size on @arcprize?
We ran ablation experiments to find out what made it work
Our findings show that you could replace the "hierarchical" architecture with a normal transformer with only a small performance drop
We found that an…
M3-Agent: A Multimodal Agent with Long-Term Memory
Impressive application of multimodal agents.
Lots of great insights throughout the paper.
Here are my notes with key insights:
3D Object Tracking without Training Data? In our @Nature Machine Intelligence paper (nature.com/articles/s4225…), we recast 3D tracking as an inverse neural rendering task where we fit a scene graph to an image that best explains this image. The method generalizes to completely…
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
"we demonstrate that employing only two techniques, i.e., advantage normalization (group-level mean, batch-level std) and token-level loss aggregation, can unlock the learning capability of critic-free policies using…
Current multimodal LLMs excel in English and Western contexts but struggle with cultural knowledge from underrepresented regions and languages. How can we build truly globally inclusive vision-language models?
We are introducing CulturalGround, a large-scale dataset with 22M…
Want to add that even with language-assisted visual evaluations, we're seeing encouraging progress in vision-centric benchmarks like CV-Bench (arxiv.org/abs/2406.16860) and Blink (arxiv.org/abs/2404.12390), which repurpose core vision tasks into VQA format. These benchmarks do help…
Want to add that even with language-assisted visual evaluations, we're seeing encouraging progress in vision-centric benchmarks like CV-Bench (arxiv.org/abs/2406.16860) and Blink (arxiv.org/abs/2404.12390), which repurpose core vision tasks into VQA format. These benchmarks do help…
To guys diving into fine-tuning open-source MoEs today: check out ESFT, our customized PEFT method for MoE models. Train with 90% less parameters, gain 95%+ task perf and keep 98% general perf :)
To guys diving into fine-tuning open-source MoEs today: check out ESFT, our customized PEFT method for MoE models. Train with 90% less parameters, gain 95%+ task perf and keep 98% general perf :)
1/N 🚀 Launching LEANN — the tiniest vector index on Earth!
Fast, accurate, and 100% private RAG on your MacBook.
0% internet. 97% smaller. Semantic search on everything.
Your personal Jarvis, ready to dive into your emails, chats, and more.
🔗 Code: github.com/yichuan-w/LEANN
📄…
📢NEW POSITION PAPER: Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts
Despite recent results, SAEs aren't dead! They can still be useful to mech interp, and also much more broadly: across FAccT, computational social science, and ML4H. 🧵
Attention is all you need - but how does it work? In our new paper, we take a big step towards understanding it. We developed a way to integrate attention into our previous circuit-tracing framework (attribution graphs), and it's already turning up fascinating stuff! 🧵
Are RL agents truly learning to reason, or just finding lucky shortcuts? 🤔
Introducing RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards — a novel framework that rewards not just outcomes, but the quality of reasoning itself, creating more robust and…
70 Followers 2K FollowingHumble yourselves, therefore, under the mighty hand of God so that at the proper time He may exalt you, casting all your anxieties on Him..✝️🙏🏿🫂
3K Followers 3K FollowingPost-Training Lead @ Together AI | OpenChat Project Lead (#1 7B LLM on Arena for 2+ months, 2M+ downloads) | DeepCoder, DeepSWE
4K Followers 2K FollowingResearch Scientist at @Meta Fundamental AI Research (FAIR), New York. Previously: Postdoc @Caltech, PhD @PrincetonCS, Undergrad @Tsinghua_Uni.
3K Followers 353 FollowingI’m a software engineer building high-performance kernels and compilers at Anthropic! Previously at Facebook/Meta (PyTorch, HHVM, ReDex)
2K Followers 26 FollowingI post about my DIY robots hardware hobby. Robotics research lead at Mistral AI. Ex-Meta/FAIR, core contributor to Llama 3. ENS PhD. Repeat founder.
4K Followers 209 FollowingCEO and Founder @CerebrasSystems. I build teams that solve hard problems, grow heirloom tomatoes, dance tango and love Vizslas
1K Followers 1K FollowingAssistant Professor @NTUsg @wkwschool | Director of SWEET Lab | PhD @NorthwesternU | technology & wellbeing; HCI; health & interpersonal communication