1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today @datologyai shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼🍳
- 3B LLMs beat 8B models🚀
- Pareto frontier for performance
We are announcing Open Thoughts, our large-scale open-source effort to curate the best open reasoning datasets!
DeepSeek-R1 is amazing but we still don't have access to high-quality open reasoning datasets. These datasets are crucial if you want to build your reasoning models!…
Introducing Bespoke-Stratos-32B, our reasoning model distilled from DeepSeek-R1 using Berkeley NovaSky’s Sky-T1 recipe.
The model outperforms Sky-T1 and o1-preview in reasoning (Math and Code) benchmarks and almost reaches the performance of DeepSeek-R1-Distill-Qwen-32B while…
We are happy to announce Curator, an open-source library designed to streamline synthetic data generation!
High-quality synthetic data generation is essential in training and evaluating LLMs/agents/RAG pipelines these days, but tooling around this is still entirely lacking!
So…
Nice to see my previous work that I led at Google DeepMind covered by VentureBeat (in the light of a new work from Meta).
Context: We had introduced the novel idea of Generative Retrieval for recommender systems to the world in our Neurips 2023 paper called TIGER (Transformer…
Databricks research scientist @shashank_r12 s shares approaches in LLMs:
- How RAG enhances accuracy
- Evolution of attention mechanisms
- Practical applications & trade-offs of Mamba architectures
I have three Ph.D. student openings in my research group at @RutgersCS starting in Fall 2025.
If you are interested in working with me on efficient algorithms and systems for LLMs, foundation models, and AI4Science, please apply at:
grad.rutgers.edu/academics/prog…
The deadline is…
🧵 Super proud to finally share this work I led last quarter - the @databricks Domain Intelligence Benchmark Suite (DIBS)! TL;DR: Academic benchmarks ≠ real performance and domain intelligence > general capabilities for enterprise tasks. 1/3
i'm somewhat confident that both the following properties will hold of language models in 2027:
1. tokenization will be gone, replaced with byte-level ingestion
2. all tokens that don't need to be read or written by a human will be continuous vectors
luckily two interesting…
At NeurIPS early? Like making GPUs go brrr?
Join me at a luncheon tomorrow on LLM Scaling x Efficiency, 5 mins from the conference center...
Note, folks need to have directly relevant work of not in the field. DM me for more info or for reccs!
Per the usual, I'll be doing 3…
I'll be at NeurIPS and would love to chat about anything AI. Also, visit the Databricks booth to checkout out some of the work we've been doing!
databricks.com/blog/databrick…
Introducing Llama 3.3 – a new 70B model that delivers the performance of our 405B model but is easier & more cost-efficient to run. By leveraging the latest advancements in post-training techniques including online preference optimization, this model improves core performance at…
🤔 How can we achieve GPT-3 175B-level performance with only 1.3B parameters? 🌟 New from #NVIDIAResearch: HYMBA (HYbrid Multi-head Bi-Attention) combines MLP and attention mechanisms to dramatically boost small language model capabilities.
HYMBA could revolutionize NLP…
[1/10] Q: with the awesome dense models available, does it make sense to do upcycling (MoE-ify them)?
A: it depends bc:
- a lot of additional FLOPs need to be sunk in
- MoEs are (out-of-the-box) slower for inference
- BUT model quality can improve a lot (given enough flops)!
🎉 Milestone: Our LIFT paper has hit 100+ citations! We introduced a simple method to adapt LLMs to new domains, and researchers are now achieving success with it across predictive chemistry, metamaterial physics & more!
Check our work at uw-madison-lee-lab.github.io/LanguageInterf…
543K Followers 24K FollowingThe best from ML/AI community | Ex-Microsoft, Rackspace, Fast Company | Wrote eight books about the future | Silicon Valley robots, holodecks, BCIs, & startups.
7K Followers 2K FollowingCEO and Co-founder @datologyai working to make it easy for anyone to make the most of their data. Former: RS @AIatMeta (FAIR), RS @DeepMind, PhD @PiN_Harvard.
42 Followers 475 FollowingData Scientist II @Wolters_Kluwer , Master's in computer science from Pune University, Visit My Blog : https://t.co/xfjPfXQUrL
29K Followers 5K FollowingDreaming about the future, romanticizing the past | MP @compoundvc investing & researching former science projects & crypto | we are only creativity constrained
4K Followers 229 Followingcto @cleanlabai • prev phd @mit_csail • research at https://t.co/MdknnUE4C6 • blog at https://t.co/oGOMQyhxv5 • open-source at https://t.co/VawMWMr84F
51K Followers 3K FollowingDeveloper Experience Lead at @GoogleDeepMind
Building Gemini API, Gemma, AI Studio and more AI products. My views
ex-Chief Llama Officer @huggingface 🇵🇪🇲🇽
14K Followers 3K Followingresearch @MIT_CSAIL @thinkymachines. work on scalable and principled algorithms in #LLM and #MLSys. in open-sourcing I trust 🐳. she/her/hers
19K Followers 20 FollowingA high-throughput and memory-efficient inference and serving engine for LLMs. Join https://t.co/lxJ0SfX5pJ to discuss together with the community!
2K Followers 29 FollowingCo-Founder, CTO, @reflection_ai
DQN, AlphaGo, AlphaZero, MuZero, Gemini RLHF
Prev Senior Staff RS and founding eng @GoogleDeepMind
AGI one PR at a time
1K Followers 496 Following👷Work for Trelis: https://t.co/tAts18SIfB
🎥 Watch on Youtube: https://t.co/BPo1FyRuz9
💡 Book a Consultation: https://t.co/DqFajF3fV0
13K Followers 441 FollowingBuilding next-gen AI at @thinkymachines. Past: Founding team @MistralAI, RS at Facebook AI Research. Ph.D. @SCSatCMU, BTech @iitbombay CS.
4K Followers 804 FollowingMachine Learning Research at the ELLIS Institute & Max-Planck for Intelligent Systems// Excited about fundamental questions in Safety & Efficiency of modern ML
44K Followers 30 FollowingWe're an AI research lab building search for the future. Most powerful web search API → https://t.co/M5QuIA5D2A high compute web search → https://t.co/uHn3Ra5yJ2