How do we navigate a growing collection of post-trained LLMs?
In Delta Activations: A Representation for Finetuned LLMs, we propose a compact embedding that encodes the post-training signal.
Try the interactive model navigator 👉 oscarxzq.github.io/delta_activati…
Language models often produce repetitive responses, and this issue is further amplified by post-training. In this work, we introduce DARLING, a method that explicitly optimizes for both response diversity and quality within online reinforcement learning!
Language models often produce repetitive responses, and this issue is further amplified by post-training. In this work, we introduce DARLING, a method that explicitly optimizes for both response diversity and quality within online reinforcement learning!
👀Have you asked LLM to provide a more detailed answer after inspecting its initial output? Users often provide such implicit feedback during interaction.
✨We study implicit user feedback found in LMSYS and WildChat. #EMNLP2025
It’s rare for competitors to collaborate. Yet that’s exactly what OpenAI and @AnthropicAI just did—by testing each other’s models with our respective internal safety and alignment evaluations. Today, we’re publishing the results.
Frontier AI companies will inevitably compete on…
We also have a very similar and maybe simpler observations in our recent paper
Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces
arxiv.org/abs/2507.09709
In fact we can build very effective guardrails using the subspace observation…
We also have a very similar and maybe simpler observations in our recent paper
Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces
arxiv.org/abs/2507.09709
In fact we can build very effective guardrails using the subspace observation…
Introducing Generative Interfaces - a new paradigm beyond chatbots.
We generate interfaces on the fly to better facilitate LLM interaction, so no more passive reading of long text blocks.
Adaptive and Interactive: creates the form that best adapts to your goals and needs!
NEW: A major AI copyright legal showdown just took a huge twist today. Facing a class action on behalf of book authors that could've seen it pay over a TRILLION in damages for alleged piracy, Anthropic has agreed to settle instead: wired.com/story/anthropi…
New Anthropic research: filtering out dangerous information at pretraining.
We’re experimenting with ways to remove information about chemical, biological, radiological and nuclear (CBRN) weapons from our models’ training data without affecting performance on harmless tasks.
Not all benchmarks are equally useful for comparing LMs—some are noisy, others show little signal. This work led by @heinemandavidj, analyzes 30 datasets across 900k OLMo training checkpoints and quantifies their signal-to-noise ratio.
Not all benchmarks are equally useful for comparing LMs—some are noisy, others show little signal. This work led by @heinemandavidj, analyzes 30 datasets across 900k OLMo training checkpoints and quantifies their signal-to-noise ratio. https://t.co/R1u7hEL6yv
Excited to be at #CHI2025. I will be presenting our work ( Best Paper Honorable Mention) tomorrow April 29th ( 11:10-11:22) moderated by @huashen218
Location : G416+ 417
I am also recruiting PhD students and visiting researchers at @sbucompsc . DMs are open. Please say hi !!
Introducing OLMoTrace! Now you can trace back OLMo generations to the training data. This is a feature that open data, open training recipe can unlock.
We’ve pushed explainability to the next level, combining open data and fast search through trillions of tokens. So proud of…
Introducing OLMoTrace! Now you can trace back OLMo generations to the training data. This is a feature that open data, open training recipe can unlock.
We’ve pushed explainability to the next level, combining open data and fast search through trillions of tokens. So proud of…
We created SuperBPE🚀, a *superword* tokenizer that includes tokens spanning multiple words.
When pretraining at 8B scale, SuperBPE models consistently outperform the BPE baseline on 30 downstream tasks (+8% MMLU), while also being 27% more efficient at inference time.🧵
🎄🎅starting tomorrow at 10 am pacific, we are doing 12 days of openai.
each weekday, we will have a livestream with a launch or demo, some big ones and some stocking stuffers.
we’ve got some great stuff to share, hope you enjoy! merry christmas.
This is not good, "Surprisingly, we observe a significant decline in LLMs’ reasoning abilities under format restrictions."
Link: arxiv.org/abs/2408.02442
We’ve developed Rule-Based Rewards (RBRs) to align AI behavior safely without needing extensive human data collection, making our systems safer and more reliable for everyday use. openai.com/index/improvin…
967 Followers 594 FollowingPhD student at @LTIatCMU / @SCSatCMU she/her, prev. @UVA and intern @ai2_allennlp
@/clara on https://t.co/GHxXbrRHSB and @/clarana on https://t.co/47UIhMGaRd
2K Followers 2K FollowingAssistant Professor @sutdsg, working on online trust & safety, computational social science, and social NLP. Currently leading the Social AI Studio.
137 Followers 148 FollowingUW NLP | MATS Scholar | Comp. Psyc/Social Sci. |
ai values ↔ human values • value alignment for the good of humanity |
Working on eval • data collection in wild
108 Followers 130 FollowingCSE PhD student @hkust in her second year advised by @junxian_he . Machine learning, NLP.
bluesky here: https://t.co/ECxlKtKTxz
76K Followers 13K FollowingNewsletter exploring AI&ML - AI 101, Agentic Workflow, Business insights. From ML history to AI trends. Led by @kseniase_ Know what you are talking about👇🏼
20K Followers 452 Followingphysics of language models @ Meta (FAIR, not GenAI)
🎓:Tsinghua Physics — MIT CSAIL — Princeton/IAS
🏅:IOI x 2 — ACM-ICPC — USACO — Codejam — math MCM
16K Followers 361 FollowingRuns an AI Safety research group in Berkeley (Truthful AI) + Affiliate at UC Berkeley. Past: Oxford Uni, TruthfulQA, Reversal Curse. Prefer email to DM.
4K Followers 949 FollowingProfessor @UChicagoCS @UChicago. Directing @ChicagoHAI, also part of @UChicagoCI. Visiting @AbridgeHQ. DM/email for Postdoc/PhD opportunities.
8K Followers 711 FollowingAssistant Professor MIT @medialab @MITEECS @nlp_mit || PhD from CMU @mldcmu @LTIatCMU || Foundations of multisensory AI to enhance the human experience.
288 Followers 85 FollowingNatural language processing researcher. Assistant Professor at Stony Brook University. Previous: Research Assistant Professor at TTIC, PhD from Harvard.
19K Followers 1K FollowingAgents @Meta MSL TBD Lab. previously posttraining research @OpenAI train LLMs to do things: deep research, chatgpt agent, etc. CS PhD @LTIatCMU
37K Followers 485 FollowingDigital Geometer, Assoc. Prof. of Computer Science & Robotics @CarnegieMellon @SCSatCMU and member of the @GeomCollective. There are four lights.
967 Followers 594 FollowingPhD student at @LTIatCMU / @SCSatCMU she/her, prev. @UVA and intern @ai2_allennlp
@/clara on https://t.co/GHxXbrRHSB and @/clarana on https://t.co/47UIhMGaRd
2K Followers 2K FollowingAssistant Professor @sutdsg, working on online trust & safety, computational social science, and social NLP. Currently leading the Social AI Studio.
137 Followers 148 FollowingUW NLP | MATS Scholar | Comp. Psyc/Social Sci. |
ai values ↔ human values • value alignment for the good of humanity |
Working on eval • data collection in wild
108 Followers 130 FollowingCSE PhD student @hkust in her second year advised by @junxian_he . Machine learning, NLP.
bluesky here: https://t.co/ECxlKtKTxz