Most RL for LLMs today is single-step optimization on a given state (e.g., an instruction), which is essentially a bandit setup. But to learn a meta-policy that can solve various bandit problems via in-context trial and error, you need true multi-turn RL over a long horizon. So,…
Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n
🚀 Excited to finally share our paper on VerlTool, released today after months of work since the initial release in late May!
VerlTool is a high-efficiency, easy-to-use framework for Agentic RL with Tool use (ARLT), built on top of VeRL. It currently supports a wide range of…
🚀 Excited to finally share our paper on VerlTool, released today after months of work since the initial release in late May!
VerlTool is a high-efficiency, easy-to-use framework for Agentic RL with Tool use (ARLT), built on top of VeRL. It currently supports a wide range of… https://t.co/MK4OAY49Uz
🌀Diversity Aware RL (DARLING)🌀
📝: arxiv.org/abs/2509.02534
- Jointly optimizes for quality & diversity using a learned partition function
- Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k
- Works for both non-verifiable & verifiable tasks
🧵1/5
🛠️ DeepSeek-R1: Technical Highlights
📈 Large-scale RL in post-training
🏆 Significant performance boost with minimal labeled data
🔢 Math, code, and reasoning tasks on par with OpenAI-o1
📄 More details: github.com/deepseek-ai/De…
🐋 4/n
I had a great time helping host MASC-SLL at Hopkins last year. MASC-SLL is a great opportunity to connect with fellow AI/NLP/Speech researchers.
If your organization is in the Mid-Atlantic region and is interested in hosting the event, please reach out!
I had a great time helping host MASC-SLL at Hopkins last year. MASC-SLL is a great opportunity to connect with fellow AI/NLP/Speech researchers.
If your organization is in the Mid-Atlantic region and is interested in hosting the event, please reach out!
I have written a blogpost offering an explanation of why both the chosen and the rejected log-probability decreases during DPO, and more interestingly, why it is a desired phenomenon to some extent.
Link: tianjianl.github.io/blog/2024/dpo/
Very happy to hear that GANs are getting the test of time award at NeurIPS 2024.
The NeurIPS test of time awards are given to papers which have stood the test of the time for a decade.
I took some time to reminisce how GANs came about and how AI has evolve in the last decade.
Excited to see that SpiritLM is fully open-sourced now. It supports speech and text as both input and output. Please consider trying it at: github.com/facebookresear…
Excited to see that SpiritLM is fully open-sourced now. It supports speech and text as both input and output. Please consider trying it at: github.com/facebookresear…
Representation matters.
Representation matters.
Representation matters, even for generative models.
We might've been training our diffusion models the wrong way this whole time. Meet REPA: Training Diffusion Transformers is easier than you think! sihyun.me/REPA/(🧵1/n)
Multilingual models are usually heavily skewed in favor of high-resource languages.
We change this with X-ALMA: an LLM-based translator committed to ensuring top-tier performance across 50 diverse languages, regardless of their resource levels!
Paper: arxiv.org/pdf/2410.03115
169 Followers 308 FollowingResearch scientist on Speech and Natural Language Processing.
My tweets are my own and can be crawled as training data freely.
29 Followers 5K FollowingLike to try new things you never know; trying to prove all software can be automated 😅 😅 😅
| ML/AI, | C++/Java/Go |
GitHub : Dyl777
143 Followers 2K FollowingOfficial journal of China Society of Image and Graphics (CSIG). The jouarnl is published by Springer, sponsored by CSIG. E-ISSN 2731-9008.
320 Followers 521 FollowingResearcher at the Alibaba DAMO Academy, Singapore R&D Center | Former Visiting Postdoc Researcher at UIUC @uiuc_nlp | NLP PhD from CUHK @CUHKofficial
726 Followers 666 FollowingFAIR, Foundational Data Research, #MetaCLIP (scaling CLIP data from scratch) for DINO, Llama, JEPA, PE, Movie Gen etc. @aiatmeta
738 Followers 1K FollowingLong document understanding, Multilingual Evals and efficient models mainly, but other #NLProc applications in free time | vim enthusiast
181 Followers 445 FollowingPh.D. Student @unccs @uncnlp, advised by @mohitban47. Prev: @AmazonScience @VinAI_Research. Working on LLM post-training and mechanistic interpretability.
169 Followers 308 FollowingResearch scientist on Speech and Natural Language Processing.
My tweets are my own and can be crawled as training data freely.
228K Followers 1 FollowingUpdates for developers building with the OpenAI Platform and API • Service status: https://t.co/kZwnwdYqOS • Support: https://t.co/qCi6M5ESZU
320 Followers 521 FollowingResearcher at the Alibaba DAMO Academy, Singapore R&D Center | Former Visiting Postdoc Researcher at UIUC @uiuc_nlp | NLP PhD from CUHK @CUHKofficial
726 Followers 666 FollowingFAIR, Foundational Data Research, #MetaCLIP (scaling CLIP data from scratch) for DINO, Llama, JEPA, PE, Movie Gen etc. @aiatmeta
322K Followers 595 FollowingLongest-serving crypto exchange since 2011 | Global Brand Ambassador: @jarenjacksonjr | 🔥 #BTCCxJJJ 🛡️ #ProvenDefense
Posts are not directed towards UK users
642K Followers 396 FollowingFree Stock Market News that is FAST, ACCURATE, CONSISTENT, and RELIABLE | Not Just Stock News | My Daily Stock Market Recap is the link in my bio ⬇️
82K Followers 761 FollowingCo-CIO of Ambrus| (Focus: Volatility Trading / Tail risk hedging )| @penn guy ( These are my personal thoughts and not the opinions of Ambrus)
2.5M Followers 23K FollowingReposting Trump’s Truth Social posts (with date/time) on X + news/commentary. Unofficial. Profile Artist: @ElenaRuseva1 Not affiliated with @realdonaldtrump.
20K Followers 862 FollowingVolatility, VIX products, options, ETPs and random musings about wine, music, travel, AI, etc. Bot slayer. Never investment advice. https://t.co/dO8lFhLFTa
5K Followers 717 FollowingBring GenAI and Knowledge Graph to enterprise systems. | Director of ML @Adobe Experience Platform | Previously @Apple @IBMResearch. Tweets are all mine.