Shashank Rajput @shashank_r12

LLM Research @Meta shashankrajput.github.io Joined October 2013

Tweets

219
Followers

833
Following

664
Likes

9K

Abhay Gupta @gupta__abhay

2 weeks ago

@jefrankle frantically asks not to whisper the words that won him this very well deserved award !!! But here’s one for you - “Lottery Tickets” !!!

Jonathan Frankle @jefrankle

2 weeks ago

@jefrankle frantically asks not to whisper the words that won him this very well deserved award !!! But here’s one for you - “Lottery Tickets” !!!

0 0 17 3K 2

1 1 5 863 0

1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today @datologyai shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼‍🍳 - 3B LLMs beat 8B models🚀 - Pareto frontier for performance

23 125 709 168K 581

Download Image

DeepSeek @deepseek_ai

7 months ago

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With…

900 2K 16K 2.6M 5K

Download Image

Mahesh Sathiamoorthy @madiator

8 months ago

We are announcing Open Thoughts, our large-scale open-source effort to curate the best open reasoning datasets! DeepSeek-R1 is amazing but we still don't have access to high-quality open reasoning datasets. These datasets are crucial if you want to build your reasoning models!…

46 291 2K 213K 1K

Download Image

Mahesh Sathiamoorthy @madiator

8 months ago

Introducing Bespoke-Stratos-32B, our reasoning model distilled from DeepSeek-R1 using Berkeley NovaSky’s Sky-T1 recipe. The model outperforms Sky-T1 and o1-preview in reasoning (Math and Code) benchmarks and almost reaches the performance of DeepSeek-R1-Distill-Qwen-32B while…

35 136 774 230K 486

Download Image

Mahesh Sathiamoorthy @madiator

9 months ago

We are happy to announce Curator, an open-source library designed to streamline synthetic data generation! High-quality synthetic data generation is essential in training and evaluating LLMs/agents/RAG pipelines these days, but tooling around this is still entirely lacking! So…

27 154 966 201K 1K

Download Gif

Mahesh Sathiamoorthy @madiator

9 months ago

Nice to see my previous work that I led at Google DeepMind covered by VentureBeat (in the light of a new work from Meta). Context: We had introduced the novel idea of Generative Retrieval for recommender systems to the world in our Neurips 2023 paper called TIGER (Transformer…

2 14 93 8K 53

Download Image

Kangwook Lee @Kangwook_Lee

9 months ago

It's finally here! Excited to share the project I led with KRAFTON and NVIDIA. The future of gaming is here 🙌

NVIDIA GeForce @NVIDIAGeForce

9 months ago

It's finally here! Excited to share the project I led with KRAFTON and NVIDIA. The future of gaming is here 🙌

69 95 782 273K 93

Download Video

5 6 86 8K 8

Databricks @databricks

9 months ago

Watch the full conversation: youtu.be/2tlWPgmiX2s?si…

0 6 10 2K 1

Databricks @databricks

9 months ago

Databricks research scientist @shashank_r12 s shares approaches in LLMs: - How RAG enhances accuracy - Evolution of attention mechanisms - Practical applications & trade-offs of Mamba architectures

1 8 21 3K 3

Download Video

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) @rao2z

9 months ago

Soo disappointed that it's just a "department" and not a School, College or an Institute.. gotta get ahead of the curve, @IITKgp!!

8 3 165 8K 4

Download Image

Hongyi Wang @HongyiWang10

10 months ago

I have three Ph.D. student openings in my research group at @RutgersCS starting in Fall 2025. If you are interested in working with me on efficient algorithms and systems for LLMs, foundation models, and AI4Science, please apply at: grad.rutgers.edu/academics/prog… The deadline is…

20 113 409 92K 227

Pallavi @herengoneagn

9 months ago

🧵 Super proud to finally share this work I led last quarter - the @databricks Domain Intelligence Benchmark Suite (DIBS)! TL;DR: Academic benchmarks ≠ real performance and domain intelligence > general capabilities for enterprise tasks. 1/3

3 10 33 15K 14

Download Image

Jack Morris @jxmnop

10 months ago

i'm somewhat confident that both the following properties will hold of language models in 2027: 1. tokenization will be gone, replaced with byte-level ingestion 2. all tokens that don't need to be read or written by a human will be continuous vectors luckily two interesting…

21 94 815 54K 665

Download Image

Rajko Radovanović @rajko_rad

10 months ago

At NeurIPS early? Like making GPUs go brrr? Join me at a luncheon tomorrow on LLM Scaling x Efficiency, 5 mins from the conference center... Note, folks need to have directly relevant work of not in the field. DM me for more info or for reccs! Per the usual, I'll be doing 3…

0 6 40 10K 20

Download Image

Shashank Rajput @shashank_r12

10 months ago

I'll be at NeurIPS and would love to chat about anything AI. Also, visit the Databricks booth to checkout out some of the work we've been doing! databricks.com/blog/databrick…

1 0 17 689 2

Ahmad Al-Dahle @Ahmad_Al_Dahle

10 months ago

Introducing Llama 3.3 – a new 70B model that delivers the performance of our 405B model but is easier & more cost-efficient to run. By leveraging the latest advancements in post-training techniques including online preference optimization, this model improves core performance at…

177 475 3K 490K 688

Download Image

NVIDIA AI Developer @NVIDIAAIDev

10 months ago

🤔 How can we achieve GPT-3 175B-level performance with only 1.3B parameters? 🌟 New from #NVIDIAResearch: HYMBA (HYbrid Multi-head Bi-Attention) combines MLP and attention mechanisms to dramatically boost small language model capabilities. HYMBA could revolutionize NLP…

8 44 173 17K 55

Download Image

Sasha Doubov @sashadoubov

11 months ago

[1/10] Q: with the awesome dense models available, does it make sense to do upcycling (MoE-ify them)? A: it depends bc: - a lot of additional FLOPs need to be sunk in - MoEs are (out-of-the-box) slower for inference - BUT model quality can improve a lot (given enough flops)!

1 8 42 7K 11

Yuchen Zeng @yzeng58

11 months ago

🎉 Milestone: Our LIFT paper has hit 100+ citations! We introduced a simple method to adapt LLMs to new domains, and researchers are now achieving success with it across predictive chemistry, metamaterial physics & more! Check our work at uw-madison-lee-lab.github.io/LanguageInterf…