My 3rd blogpost on PG, the topic I am least familiar with but get asked a lot, so I thought I'd just put together the very limited stuff I know on this topic. Somehow the post gets cynical from time to time🙃
nanjiang.cs.illinois.edu/2025/09/29/pg.…
New in-depth blog post time: "Inside NVIDIA GPUs: Anatomy of high performance matmul kernels". If you want to deeply understand how one writes state of the art matmul kernels in CUDA read along.
(Remember matmul is the single most important operation that transformers execute…
🆕 𝐔𝐩𝐝𝐚𝐭𝐞!! A few additional findings for 𝐑𝐨𝐥𝐥𝐨𝐮𝐭–𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐌𝐢𝐬𝐦𝐚𝐭𝐜𝐡:
① 𝐏𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐬𝐦 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐜𝐞 is a huge driver of the gap, with Sequence Parallelism (SP) causing most high max mismatch.
② 𝐋𝐨𝐧𝐠𝐞𝐫 𝐬𝐞𝐪𝐮𝐞𝐧𝐜𝐞𝐬…
🆕 𝐔𝐩𝐝𝐚𝐭𝐞!! A few additional findings for 𝐑𝐨𝐥𝐥𝐨𝐮𝐭–𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐌𝐢𝐬𝐦𝐚𝐭𝐜𝐡:
① 𝐏𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐬𝐦 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐜𝐞 is a huge driver of the gap, with Sequence Parallelism (SP) causing most high max mismatch.
② 𝐋𝐨𝐧𝐠𝐞𝐫 𝐬𝐞𝐪𝐮𝐞𝐧𝐜𝐞𝐬… https://t.co/3VxvsPfm8M
keep a doc with all health related events and symptoms you have ever had
for every new health concern, give it to chatgpt 5 pro along with your context, and it will find the craziest patterns and links between things, and a list of actions
Excited to share that I will be starting as an Assistant Professor in CSE at UCSD (@ucsd_cse) in Fall 2026! I am currently recruiting PhD students who want to bridge theory and practice in deep learning - see here: cs.princeton.edu/~smalladi/recr…
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference”
We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to…
Interesting findings! We also attempted something similar in our AZR paper section D.2, where the proposer needs to construct a composite function f(g,..g)
Interesting findings! We also attempted something similar in our AZR paper section D.2, where the proposer needs to construct a composite function f(g,..g)
Introducing 🛡️ExCyTIn‑Bench: Evaluating LLM agents on Cyber Threat Investigations. It’s built on Azure tenant, a real Security Operations Center environment, covering 57 tables. Explore how LLMs fare in realistic, multi-hop incident detection! #Cybersecurity#AI#LLM#Benchmark
People love 𝗽𝗮𝘀𝘀@𝗸 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴.
What to do if you have 𝟭𝟬𝟬 samples and you wanna optimize 𝗽𝗮𝘀𝘀@𝟭𝟬?
✨This is the reward. Prsented in the analytic form.
Next step? Pass it to GRPO and witness the magic.
People love 𝗽𝗮𝘀𝘀@𝗸 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴.
What to do if you have 𝟭𝟬𝟬 samples and you wanna optimize 𝗽𝗮𝘀𝘀@𝟭𝟬?
✨This is the reward. Prsented in the analytic form.
Next step? Pass it to GRPO and witness the magic. https://t.co/D8KM0rQm1s
Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n
OpenAI hasn’t open-sourced a base model since GPT-2 in 2019. they recently released GPT-OSS, which is reasoning-only...
or is it?
turns out that underneath the surface, there is still a strong base model. so we extracted it.
introducing gpt-oss-20b-base 🧵
⚡𝐅𝐏𝟖 makes RL faster — but at the cost of performance.
We present 𝐅𝐥𝐚𝐬𝐡𝐑𝐋, the first 𝐨𝐩𝐞𝐧–𝐬𝐨𝐮𝐫𝐜𝐞 & 𝐰𝐨𝐫𝐤𝐢𝐧𝐠 𝐑𝐋 𝐫𝐞𝐜𝐢𝐩𝐞 that applies 𝐈𝐍𝐓𝟖/𝐅𝐏𝟖 for rollout 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐥𝐨𝐬𝐢𝐧𝐠 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞 compared to 𝐁𝐅𝟏𝟔!
📝 Blog:…
1K Followers 6K FollowingStudent,Love to solve hardest math problem. LLM's, Mathematical Research(Geometric Topology,Differential Geometry),Quantum Computing.Lord Krishna is God Of Math
544K Followers 24K FollowingThe best from ML/AI community | Ex-Microsoft, Rackspace, Fast Company | Wrote eight books about the future | Silicon Valley robots, holodecks, BCIs, & startups.
15K Followers 7K FollowingI build tough benchmarks for LMs and then I get the LMs to solve them. SWE-bench & SWE-agent. Postdoc @Princeton. PhD @nlpnoah @UW.
1K Followers 659 FollowingThe real AGI is the friends we make along the way. PhD in FAIR CodeGen @AIatMeta. Alumni: @Huggingface, Sea AI Lab, @openai, École Polytechnique, SJTU
26K Followers 229 Followinggetting us to singularity with friends
computers can be understood: https://t.co/doHE1Qv2Sj
x @GoogleDeepMind @Microsoft
tensor core maximalist
672 Followers 333 FollowingPostdoctoral Fellow at @PrincetonPLI | Past: Computer Science PhD @TelAvivUni & Apple Scholar in AI/ML | Interested in the foundations of deep learning
6K Followers 1K FollowingSenior staff research scientist at DeepMind. Opinions are my own. Re-tweets and favorites not to be considered as endorsements.
20K Followers 100 FollowingMember of Technical Staff at Anthropic AlphaGo, AlphaZero, MuZero, AlphaCode, AlphaTensor, AlphaProof Gemini RL Prev Principal Research Engineer at DeepMind
113K Followers 3 FollowingThe official newsroom for @OpenAI. Tweets are on the record.
If you like this account, you’ll love our blog: https://t.co/nEYf8Iq3C0
11K Followers 639 FollowingCS prof at Penn. Amazon Scholar at AWS. Author of The Ethical Algorithm (w/ Michael Kearns). I study machine learning, privacy, game theory, and uncertainty.
14K Followers 3K Followingresearch @MIT_CSAIL @thinkymachines. work on scalable and principled algorithms in #LLM and #MLSys. in open-sourcing I trust 🐳. she/her/hers
46K Followers 3K FollowingWe're in a race. It's not USA vs China but humans and AGIs vs ape power centralization.
@deepseek_ai stan #1, 2023–Deep Time
«C’est la guerre.» ®1
909 Followers 79 Following🚀Bringing China's AI & tech trends, voices and perspectives to the global stage.
⚡️Powered by 知乎/https://t.co/OkIemRZdcj, China's leading knowledge community.