For agents to improve over time, they can’t afford to forget what they’ve already mastered.
We found that supervised fine-tuning forgets more than RL when training on a new task!
Want to find out why? 👇
Had a great time presenting OPRM at ASAP!
We talked about recurrent memory overflows, Long Context vs. RAG, and possible scaling paradigms for recurrent LLMs.
Check it out👇
Recording: youtu.be/O1_qqNAK7XE
Slides: asap-seminar.github.io/assets/slides/…
Had a great time presenting OPRM at ASAP!
We talked about recurrent memory overflows, Long Context vs. RAG, and possible scaling paradigms for recurrent LLMs.
Check it out👇
Recording: youtu.be/O1_qqNAK7XE
Slides: asap-seminar.github.io/assets/slides/…
Today we're launching Subconscious: a new platform for building agents with long-horizon reasoning and tool use, backed by MIT research.
One API call. Tool use. Context beyond existing limits.
If you're building agents, let's talk.
📄🚨 New!
Tired of waiting minutes for LLMs to "think"?
Test-time scaling (O3, DeepSeek-R1) lets LLMs reason before answering — but users are left clueless, with no progress or control.
Not anymore!
We expose the LLM’s internal 🕰️, and show how to monitor 📊 & overclock it⚡
🧵👇
We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between?
Introducing Log-Linear Attention with:
- Log-linear time training
- Log-time inference (in both time and memory)
- Hardware-efficient Triton kernels
Thanks @MIT_CSAIL for featuring our work!🖊️🎨
Huge thanks to the CSAIL news team for the fun article + video!!
We'll be presenting SketchAgent at #CVPR2025 next week — come say hi if you're curious how LLMs can be used to collaboratively sketch!🖌️
👉 bit.ly/43mTme1
Thanks @MIT_CSAIL for featuring our work!🖊️🎨
Huge thanks to the CSAIL news team for the fun article + video!!
We'll be presenting SketchAgent at #CVPR2025 next week — come say hi if you're curious how LLMs can be used to collaboratively sketch!🖌️
👉 bit.ly/43mTme1
Sometimes the best way to express an idea is by sketching it out.
A system from MIT CSAIL & Stanford captures this iterative process by teaching LLMs to create sequential sketches. It could work w/users to visually communicate concepts: bit.ly/4kfXFhk
Overflow Prevention Enhances Long-Context Recurrent LLMs
OPRM chunk-based inference:
- Split the context into chunks
- Process chunks in parallel (speculative prefill)
- Select the best one (e.g., lowest entropy).
- Decode only from that chunk
Advantages:
- No training required…
605 Followers 293 FollowingPhD student at @CseHuji | Passionate about model weights as a new data modality, and yoga - not necessarily in that order 😉 | Ex Intern Google Research.
15K Followers 7K FollowingI build tough benchmarks for LMs and then I get the LMs to solve them. SWE-bench & SWE-agent. Postdoc @Princeton. PhD @nlpnoah @UW.
1K Followers 864 FollowingProfessor at Saarland University @LstSaar @SIC_Saar.
Previously PhD at Stanford @stanfordnlp.
Machine learning, language, and cognitive science.
822 Followers 2K FollowingNLP/Code Generation PhD at FAIR (Meta AI) and INRIA - previously researcher at Stanford University - MS Stanford 22’ - Centrale Paris P2020
4K Followers 2K FollowingHead of Volumetric 3D Video at Meta
Prev Projects: Hyperscape, MapAnything, Dynamic 3D Gaussian Splatting, SplaTAM, HOTA +more
Prev PhD at RWTH + CMU + Oxford
55K Followers 0 FollowingWe are building a world class AI R&D company in Tokyo. We want to develop AI solutions for Japan’s needs, and democratize AI in Japan. https://t.co/1q07mb3TzE
1K Followers 8K FollowingAI inference, speculative decoding, open source. Built novel decoding algorithms – default in Hugging Face Transformers (150+ ⭐). Making AI faster + cheaper
56K Followers 924 FollowingNeuroscientist: consciousness, perception, & dreamachines. TED speaker, & author: Being You - A New Science of Consciousness.
10K Followers 108 FollowingAI21 Labs builds Foundation Models and AI Systems for the enterprise that accelerate the use of GenAI in production.
Meet Maestro
https://t.co/IJyxlWYJoV
920 Followers 386 FollowingResearch Scientist@NVIDIA . Making LLMs e.g., Hymba, Nemotron serials. Ex @Harvard @Meta @Tencent| Views and opinions are my own
15K Followers 7K FollowingI build tough benchmarks for LMs and then I get the LMs to solve them. SWE-bench & SWE-agent. Postdoc @Princeton. PhD @nlpnoah @UW.
1K Followers 296 FollowingLanguage in Context: NLP, Networks, Social Dynamics and Politics. Opinions are mine. 👉BB needs to go. now👈
Personal (parenting and other failures): @orenTsur
4K Followers 1K FollowingResearch Intern @Google, Ph.D. Student @Cornell_CS, interested in machine (continual) learning and understanding what is called intelligence.
50K Followers 9K FollowingI lead @Cohere_Labs. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, ML reliability. Changing spaces where breakthroughs happen.
1.4M Followers 570 FollowingThe Massachusetts Institute of Technology is a world leader in research and education. Related accounts: @MITevents @MITstudents @MIT_alumni
7K Followers 3 FollowingTweeting interesting papers submitted at https://t.co/rXX8x0HzXV.
Submit your own at https://t.co/QhbJKXBd4Q, and link models/datasets/demos to it!
52K Followers 64 FollowingStudent of mind and nature, libertarian, chess player, cancer survivor. @ Keen, UAlberta, Amii, https://t.co/u8za2Kod54, The Royal Society, Turing Award
1K Followers 846 FollowingResearcher @IBMResearch. Postdoc @berkeley_ai. PhD @TelAvivUni. Working on Compositionality, Multimodal Foundation Models, and Structured Physical Intelligence.
26K Followers 881 FollowingResearch Scientist Director in Meta FAIR. Reasoning, Optimization and Understanding LLM. Novelist in spare time. PhD in @CMU_Robotics.
4K Followers 2K FollowingHead of Volumetric 3D Video at Meta
Prev Projects: Hyperscape, MapAnything, Dynamic 3D Gaussian Splatting, SplaTAM, HOTA +more
Prev PhD at RWTH + CMU + Oxford