Manish Pandey 🧬 @Manish_GenAI

Co-Founder @FreeDoctr 🧬⚕️ , Building a collaborative Platform for Patients and Doctors. 🌐🩻 #GraphML, #GeometricDL, #Gen AI #ML ,#RL, #LLM, #AIForHealthcare linkedin.com/in/manish-gena… Joined August 2021

Tweets

969
Followers

472
Following

7K
Likes

3K

Zhenfei (Jeremy) Yin @9LdROhjZE56jSh9

2 weeks ago

🚀 New Survey Alert! 📄 The Landscape of Agentic Reinforcement Learning for LLMs: A Survey By 16 top institutions (Oxford, NUS, UIUC, UCL, and more) We explore how LLMs evolve from passive text generators → proactive agents with planning, memory, tool use, reasoning & beyond.…

0 2 12 574 4

Download Image

Francesco Bertolotti @f14bertolotti

5 days ago

This is a new 100-page RL for LLM literature review. It appears fairly complete. It also covers static/dynamic data and frameworks. And it has some nice figures! 🔗arxiv.org/abs/2509.08827

9 118 750 42K 803

Download Image

Rohan Paul @rohanpaul_ai

a week ago

Another great @GoogleDeepMind paper. Shows how to speed up LLM agents while cutting cost and keeping answers unchanged. 30% lower total cost and 60% less wasted cost at comparable acceleration. Agents plan step by step, so each call waits for the previous one, which drags…

21 129 710 100K 618

Download Image

Marktechpost AI Dev News ⚡ @Marktechpost

2 weeks ago

Biomni-R0: New Agentic LLMs Trained End-to-End with Multi-Turn Reinforcement Learning for Expert-Level Intelligence in Biomedical Research Researchers from Stanford University and UC Berkeley introduced a new family of models called Biomni-R0, built by applying reinforcement…

1 8 25 661 3

Download Image

TuringPost @TheTuringPost

2 weeks ago

Compute is not a big deal for LLMs now, but memory is. It's an idea used in XQuant – a new method by @UCBerkeley created to reduce memory use up to 12x. - XQuant doesn't store usual KV cache - It quantizes and stores only X - the layer input activations - When needed, it…

2 58 294 14K 173

Download Image

Franky @FrankYouChill

2 weeks ago

A 14B model just beat a 671B model on math reasoning. Here’s how Microsoft’s rStar2-Agent achieves frontier math performance in 1 week of RL training - by “thinking smarter, not longer.” 🧵

27 207 2K 205K 1K

Download Image

TuringPost @TheTuringPost

2 weeks ago

PAN (Physical, Agentic, and Nested) - a very interesting version of world models, based on the new building principles for such models. It's based on a complex mountaineering scenario and uses multimodal inputs: sights, sounds, sensations, body strain, temperature, text, etc.…

7 28 119 11K 70

Download Image

Jessy Lin @realJessyLin

3 weeks ago

🔍 How do we teach an LLM to 𝘮𝘢𝘴𝘵𝘦𝘳 a body of knowledge? In new work with @AIatMeta, we propose Active Reading 📙: a way for models to teach themselves new things by self-studying their training data. Results: * 𝟔𝟔% on SimpleQA w/ an 8B model by studying the wikipedia…

15 157 1K 128K 1K

Download Image

Greg Kamradt @GregKamradt

a month ago

What makes the HRM model work so well for its size on @arcprize? We ran ablation experiments to find out what made it work Our findings show that you could replace the "hierarchical" architecture with a normal transformer with only a small performance drop We found that an…

14 87 841 225K 575

Download Image

elvis @omarsar0

4 weeks ago

M3-Agent: A Multimodal Agent with Long-Term Memory Impressive application of multimodal agents. Lots of great insights throughout the paper. Here are my notes with key insights:

22 189 989 128K 1K

Download Image

Felix Heide @_FelixHeide_

a month ago

3D Object Tracking without Training Data? In our @Nature Machine Intelligence paper (nature.com/articles/s4225…), we recast 3D tracking as an inverse neural rendering task where we fit a scene graph to an image that best explains this image. The method generalizes to completely…

8 69 385 27K 256

Download Video

Tanishq Mathew Abraham, Ph.D. @iScienceLuvr

a month ago

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning "we demonstrate that employing only two techniques, i.e., advantage normalization (group-level mean, batch-level std) and token-level loss aggregation, can unlock the learning capability of critic-free policies using…

3 30 199 14K 174

Download Image

Jean de Nyandwi @Jeande_d

a month ago

Current multimodal LLMs excel in English and Western contexts but struggle with cultural knowledge from underrepresented regions and languages. How can we build truly globally inclusive vision-language models? We are introducing CulturalGround, a large-scale dataset with 22M…

7 24 153 31K 89

Download Image

Peter Tong @TongPetersb

a month ago

Want to add that even with language-assisted visual evaluations, we're seeing encouraging progress in vision-centric benchmarks like CV-Bench (arxiv.org/abs/2406.16860) and Blink (arxiv.org/abs/2404.12390), which repurpose core vision tasks into VQA format. These benchmarks do help…

Martin Ziqiao Ma @ziqiao_ma

a month ago

5 18 98 28K 56

4 15 65 12K 24

Graham Neubig @gneubig

a month ago

Summary of GPT-OSS architectural innovations: 1. sliding window attention (ref: arxiv.org/abs/1901.02860) 2. mixture of experts (ref: arxiv.org/abs/2101.03961) 3. RoPE w/ Yarn (ref: arxiv.org/abs/2309.00071) 4. attention sinks (ref: streaming llm arxiv.org/abs/2309.17453)

11 359 2K 116K 2K

Zihan Wang - on RAGEN @wzihanw

a month ago

To guys diving into fine-tuning open-source MoEs today: check out ESFT, our customized PEFT method for MoE models. Train with 90% less parameters, gain 95%+ task perf and keep 98% general perf :)

DeepSeek @deepseek_ai

a year ago

To guys diving into fine-tuning open-source MoEs today: check out ESFT, our customized PEFT method for MoE models. Train with 90% less parameters, gain 95%+ task perf and keep 98% general perf :)

9 118 618 144K 408

Download Image

6 85 620 48K 387

Yichuan Wang @YichuanM

a month ago

1/N 🚀 Launching LEANN — the tiniest vector index on Earth! Fast, accurate, and 100% private RAG on your MacBook. 0% internet. 97% smaller. Semantic search on everything. Your personal Jarvis, ready to dive into your emails, chats, and more. 🔗 Code: github.com/yichuan-w/LEANN 📄…

3 45 144 27K 107

Download Image

Raj Movva @rajivmovva

a month ago

📢NEW POSITION PAPER: Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts Despite recent results, SAEs aren't dead! They can still be useful to mech interp, and also much more broadly: across FAccT, computational social science, and ML4H. 🧵

2 63 359 32K 266

Download Image

Jack Lindsey @Jack_W_Lindsey

2 months ago

Attention is all you need - but how does it work? In our new paper, we take a big step towards understanding it. We developed a way to integrate attention into our previous circuit-tracing framework (attribution graphs), and it's already turning up fascinating stuff! 🧵

20 196 2K 190K 2K

Zhaopeng Tu @tuzhaopeng

2 months ago

Are RL agents truly learning to reason, or just finding lucky shortcuts? 🤔 Introducing RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards — a novel framework that rewards not just outcomes, but the quality of reasoning itself, creating more robust and…