Yi Su @YiSu37328759
Researcher @GoogleDeepmind. Phd from @Cornell. Working on contextual bandits, reinforcement learning. yisu0005.github.io Mountain View, CA Joined July 2019-
Tweets53
-
Followers388
-
Following620
-
Likes1K
Our paper on Long-Term Value of Exploration just won the best paper award at WSDM 2024❤️! Congratulations to the team and amazing collaborators! arxiv.org/pdf/2305.07764…
At #NeurIPS2023 till Saturday. Happy to share our work "Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective". Check the poster at Session 5 Thu 10:45 - 12:45 CST. Looking forward to discussing RL, recommendation, ranking with friends and colleagues! (1/3)
Congratulations to Richard Zemel, @LedellWu, @kswersk, Toni Pitassi, Cynthia Dwork for winning the #ICML2023 Test-of-Time Award on their paper "Learning Fair Representations"! Announcement: icml.cc/Conferences/20… Paper: proceedings.mlr.press/v28/zemel13.ht…
thanks to super hard work by Lavender (lavenderjiang.github.io), we showed the potential of (medical) language as a powerful interface to access EHR using a language model in this new work. check this work at nature.com/articles/s4158…
thanks to super hard work by Lavender (lavenderjiang.github.io), we showed the potential of (medical) language as a powerful interface to access EHR using a language model in this new work. check this work at nature.com/articles/s4158…
Check out our recent work! We find that recent models that imitate ChatGPT — like Alpaca, Vicuna, Koala — largely learn ChatGPT’s style and less so its capabilities/factuality. And that base model quality can be a highly effective lever for improving on factuality.
Check out our recent work! We find that recent models that imitate ChatGPT — like Alpaca, Vicuna, Koala — largely learn ChatGPT’s style and less so its capabilities/factuality. And that base model quality can be a highly effective lever for improving on factuality.
🚀1/ Exciting news! Our recent paper, accepted at #UAI2023, takes a fresh look at linear bandits, which are widely used in search/recommendation applications. We rethink misspecification and show that the classical LinUCB algorithm is surprisingly robust!🔗arxiv.org/abs/2302.13252
New study from @CornellCIS' Jon Kleinberg & former CS graduate student @manish_raghavan can help online media companies determine what gives users long-term satisfaction – not just the instant gratification of continual scrolling. cis.cornell.edu/what-user-want…
I am excited to announce that @angelamczhou and I will organize a weekly seminar for the Foundations of Data Science Institute (FODSI) fodsi.us Talks on zoom every Fridays at 12pm PST / 3pm EST First talk on Nov 5th by Devavrat Shah (MIT) More details to follow!
I wrote a little position paper about how unsupervised RL methods could provide a powerful framework for self-supervised learning. This will appear in the inaugural CoRL Blue Sky Track: arxiv.org/abs/2110.12543 Condensed online version (w/ animations!): link.medium.com/bZ4PEOCVEkb
BREAKING NEWS: The 2021 Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel has been awarded with one half to David Card and the other half jointly to Joshua D. Angrist and Guido W. Imbens. #NobelPrize
If you would like to evaluate your off-policy evaluation (OPE) methods, check out our OPE benchmarks. Deep off-policy evaluation (DOPE) -- though you can use it with non-deep methods too if you prefer! github.com/google-researc… arxiv.org/abs/2103.16596
If you would like to evaluate your off-policy evaluation (OPE) methods, check out our OPE benchmarks. Deep off-policy evaluation (DOPE) -- though you can use it with non-deep methods too if you prefer! github.com/google-researc… arxiv.org/abs/2103.16596 https://t.co/RDEpJ7z9f5
With Peter Bartlett and Sasha Rakhlin, an overview of a slice of research in theoretical machine learning (and an effort to connect some dots): arxiv.org/abs/2103.09177
Sessions from Reinforcement Learning Day 2021 are now available on-demand. Watch a keynote by Professor Doina Precup of @mcgillu, a debate between industry leaders Professor Yoshua Bengio of @Mila_Quebec and @JohnCLangford, and more: aka.ms/AAawpdp
Recent and forthcoming machine learning and AI seminars: December edition - ift.tt/3moDlM9
SPREAD THE WORD: I'M HIRING PHD STUDENTS FOR FALL 2021 AT @UPFBarcelona / @DTIC_UPF!! The project is about reinforcement learning theory and is funded by my recent @ERC_Research Starting Grant. (#ERCStG) Details: cs.bme.hu/~gergo/jobs.ht… Deadline: Feb 12, 2021
Alekh Agarwal, Akshay Krishnamurthy, and I finished a major tutorial on "Theoretical Foundations of Reinforcement Learning" (hunch.net/~tforl/) for FOCS (focs2020.cs.duke.edu/program/), but potentially of much broader interest. We'll be doing Q&A Friday.
What are the Statistical Limits of Offline RL with Linear Function Approximation? - Ruosong Wang ift.tt/35wIzir
RL folks, meet your newest friend: THE LOGISTIC BELLMAN ERROR A convex loss function derived from first principles of MDP theory that leads to practical RL algorithms that can be implemented without *any* approximation of the theory. Preprint: arxiv.org/abs/2010.11151 🧵👇 1/18
Excited to announce 2020 ACM SIGKDD Innovation Award is given to Thorsten Joachims (Cornell University) for his research contributions in machine learning, including influential work studying human biases in information retrieval, SVM, and structured output prediction.
Tim Rocktäschel @_rockt
29K Followers 2K Following Open-Endedness Team Lead @GoogleDeepMind, Professor of AI @AI_UCL, PI @UCL_DARK, @ELLISforEurope Scholar. ex @MetaAI (FAIR), @CompSciOxford. Opinions my own.JoannaWinifred @eT19SIkh6A6V6e4
0 Followers 109 Followingjessica🩶 @DS_Jessica_
13K Followers 2K Following analytics lead & angel investor & advisor. always learning = business & innovation. doing #datascienceShikego @Shikego494882
3 Followers 290 FollowingTeresaThodore @M8SOcCJSdtLFj4j
0 Followers 139 FollowingVioletClarissa @0GVb70HVV9jQ0
0 Followers 232 FollowingTeele @Teele1840757
2 Followers 287 FollowingLaurenHamilton @0ew81rIbG6hD3lA
0 Followers 269 FollowingWenting Zhao @wzhao_nlp
815 Followers 358 Following PhD student @cornell_tech Food for life, NLP for soul!Aleksandra Faust @AleksandraFaust
2K Followers 515 Following Research Scientist with Google @Deepmind. Previously, @GoogleAI in #GoogleBrain. @Waymo, @SandiaLabs, @UNM, @UIUC.JieWang @sasxbd
15 Followers 273 FollowingDongwei Jiang @Dongwei__Jiang
135 Followers 249 Following Spent six years working in industry as a speech researcher, currently I'm shifting my focus to LLM and studying at @JohnsHopkins as a master's studentAreteAlchemist @AreteAlchemist
45 Followers 203 Following #Areté: Exploring the pursuit of personal excellence, growth, and fulfillment. Unlock your full potential!Zhuokai Zhao @zhuokaiz
2 Followers 21 Following Final-year CS PhD Candidate at @UChicago. Research in data-centric and trustworthy ML. Previously @Meta, @Twitch, @Siemens, @HopkinsEngineer, @ECEILLINOIS.Imad Khwaja @flyingblackswan
164 Followers 2K Following SaaS Growth || SEO Marketing Agency || EntrepreneurShiguang Wu @shiguang_wu
50 Followers 218 Following MSc student in the IRLAB at Shandong University. Working on Information Retrieval.Johan S. Obando 👍�.. @johanobandoc
1K Followers 2K Following Graduate student @Mila_Quebec @UMontrealDIRO | RL/Deep Learning/AI | De Cali/Colombia pal’ Mundo 🇨🇴 | #JuntosProsperamos⚡#TogetherWeThrive| 🌱🌎Mengdi Wang @MengdiWang10
1K Followers 267 Following Princeton professor in AIML, optimization and data science. Program Chair @ICLR2023. Formerly @MIT @GoogleDeepmind @TsinghuaLushSun @SunLush91172
6 Followers 1K FollowingMillennium Twain #Tru.. @MillenniumTwain
1K Followers 5K Following A Thousand-fold refinement in our EM-Field Mapping of Creation in Electrons, Protons, DiProtons, Alphas, Astrospheres, Cluster/Streams, Galaxies — Filaments AllJunchen Fu @ron_junchen_fu
12 Followers 59 Following CS PhD student at University of Glasgow RS/IR/DL/Multimodal孙贻丽 @sunyili4
2 Followers 38 FollowingQianqian Wang @QianqianWang5
829 Followers 244 Following Postdoc at UC Berkeley and Visiting Researcher at Google. Former Ph.D. student at Cornell Tech. https://t.co/LyIdb5HmM9Yiğit Yalın @yigityalin02
9 Followers 570 FollowingNick @FiskQuaid
187 Followers 950 Following Software Engineer and Entrepreneur from Texas. Interested in machine learning, robotics, and network science. CEO @conceptionary1 Exo/AccBowen😊(e/acc) @bowenisrising
362 Followers 3K Following CS System PhD student. Efficient LLM, Infra. Study Japanese. #INFP Get or LoseXinya Du @Xinya16
815 Followers 437 Following Assistant Professor of CS, at UT Dallas; Cornell CS PhD. #NLProc #DLKeron Lewis @keron_dev
306 Followers 3K Following "If something is important enough, you should try. Even if the probable outcome is failure."Yong @YongXien
0 Followers 2K FollowingJoe @joemkwon
763 Followers 2K Following thinking about what good futures (embedded with powerful AI systems) might look likeZhuofan Xia @Vladmir2506
96 Followers 829 Following PhD candidate @Tsinghua_Uni 2020-2026; BEng @Tsinghua_Uni 2016-2020;Jiaqi Ma @Jiaqi_Ma_
777 Followers 311 Following Assistant Professor @iSchoolUI. Interested in Trustworthy ML, Graphs, GenAI Copyright. Previously @Harvard @UMich @Tsinghua_UniEthan Wenjun Hou @houwenjun060
47 Followers 381 Following 🪫Phd Student @HongKongPolyU & @SUSTechSZ | Natural Language Processing & Medical Report Generation & Healthcare AgentsCuiqing Li @lcq_dev
66 Followers 635 Following AI System&Algorithm Engineer | Let's build AGI Ex-Engineer@Bytedance, Meta (PyTorch) | Alumni@JHU | Alumni@UMN-Twin CitiesSourjya Sarkar @SarkarSourjya
38 Followers 978 Following Loves traveling, music, photography, friends, research and fun !! A bit of a machine learning enthusiast!Yuxin Xiao @YuxinXiao6
141 Followers 545 Following Ph.D. student at @mitidss @MIT_CSAIL working on healthy ML/NLP for healthcare. Graduated from @mldcmu and @IllinoisCS. Interned at @Bosch_AI.G Lan @LanGuangchen
30 Followers 374 FollowingLeonce B. Ano, PMP, M.. @Leonceano
401 Followers 2K Following Program manager & Governance | Tech enthusiast | Co-founder of TgMaster | Passionate about #Tech #AI #EducationDuo Cheng @DuoCheng99
61 Followers 278 Following CS PhD student at Virginia Tech | Online Learning, Bandits | I don't study "AI" | he/him/hisArbaaz Qureshi @arbaaz__qureshi
336 Followers 2K Following Data Scientist @Lowes | Previously @Google and @MSFTResearch| CS grad @UMassAmherst and undergrad @IITPatHow @howml
277 Followers 691 Following I rank short-form videos. Opinions mine. Ex-tweep and a heavy non-tweeter. Let’s go Warriors!Berivan Isik @BerivanISIK
3K Followers 2K Following PhD @StanfordAILab. Scalable & trustworthy ML, transfer learning, language models, federated learning, privacy | prev: @Google @AWSCloud @VectorInstSebastian Riedel (@ri.. @riedelcastro
15K Followers 470 Following Researcher in NLP/ML @deepmind, @ucl_nlp, @[email protected] on MastodonTim Rocktäschel @_rockt
29K Followers 2K Following Open-Endedness Team Lead @GoogleDeepMind, Professor of AI @AI_UCL, PI @UCL_DARK, @ELLISforEurope Scholar. ex @MetaAI (FAIR), @CompSciOxford. Opinions my own.Jerry Wei @JerryWeiAI
5K Followers 270 Following 🧐 Improving and aligning large language models 🧠 Research Engineer @GoogleDeepMind ⏰ Past: @Stanford, @Google BrainBen Poole @poolio
17K Followers 1K Following research scientist at google brain. phd in neural nonsense from stanford.Ali Eslami @arkitus
7K Followers 788 Following Director of Research Strategy #GeminiAI. Scientist at @GoogleDeepMind studying artificial intelligence.Wenting Zhao @wzhao_nlp
815 Followers 358 Following PhD student @cornell_tech Food for life, NLP for soul!Aleksandra Faust @AleksandraFaust
2K Followers 515 Following Research Scientist with Google @Deepmind. Previously, @GoogleAI in #GoogleBrain. @Waymo, @SandiaLabs, @UNM, @UIUC.Mengzhou Xia @xiamengzhou
3K Followers 624 Following PhD student @princeton_nlp, MS @CarnegieMellon, Undergrad at Fudan.Asari AI @AsariAILabs
587 Followers 0 Following Co-invent the future -- learn more: https://t.co/jxWiRvWrVn -- we're hiring!Mengdi Wang @MengdiWang10
1K Followers 267 Following Princeton professor in AIML, optimization and data science. Program Chair @ICLR2023. Formerly @MIT @GoogleDeepmind @TsinghuaPrinceton PLI @PrincetonPLI
1K Followers 19 Following Princeton University initiative enhancing fundamental understanding of AI, enabling its use in academic disciplines, and examining AI's societal implications.Qianqian Wang @QianqianWang5
829 Followers 244 Following Postdoc at UC Berkeley and Visiting Researcher at Google. Former Ph.D. student at Cornell Tech. https://t.co/LyIdb5HmM9Xinya Du @Xinya16
815 Followers 437 Following Assistant Professor of CS, at UT Dallas; Cornell CS PhD. #NLProc #DLKuang Xu 许匡 @ProfKuangXu
674 Followers 68 Following Professor @StanfordGSB. Operations Research, Data Science & AI Strategy. https://t.co/qS8dfEG3zJ. I write https://t.co/LgsSID9Va6Yuge Shi (Jimmy) @YugeTen
4K Followers 476 Following 石宇歌 · Research Scientist @DeepMind · Past: PhD at Oxford, intern at Google Brain, FAIR, CSIRO · she/herJoelle Pineau @jpineau1
10K Followers 352 Following AI researcher. VP AI Research (FAIR), @AIatMeta. Professor of Computer Science, @mcgillu. Core academic member, @Mila_QuebecFeryal @FeryalMP
9K Followers 2K Following Staff Research Scientist @DeepMind & Board of Directors @WiMLworkshop.Diyi Yang @Diyi_Yang
14K Followers 2K Following Assistant Professor @Stanford CS @StanfordNLP @StanfordAILab. Formerly @GeorgiaTech. Computational Social Science & NLPZiteng Sun @SZiteng
186 Followers 224 Following Responsible and efficient AI. Topics: LLM efficiency; Differential Privacy; Information Theory. Research Scientist @Google; PhD @Cornell; https://t.co/6jlNwkUhH6.Yonathan Efroni @EfroniYonathan
474 Followers 384 Following RL | ML | Music | etc Research scientist @MetaNathan Lambert @natolambert
25K Followers 690 Following Figuring out AI @allen_ai, "rl boi" DM me papers. Writes @interconnectsai, talks @retortai Has phd and some credentialsAvi Singh @avisingh599
2K Followers 1K Following Making LLMs a little smarter @GoogleDeepMind. Previously worked on robots. Ask for my strava and goodreads :)Sara Hooker @sarahookr
39K Followers 8K Following I lead @CohereForAI. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, @trustworthy_ml. Changing spaces where breakthroughs happen.Jan Hendrik Kirchner @janhkirchner
929 Followers 523 Following phd student in comp neuroscience @ mpi brain research frankfurt, https://t.co/42mTlpAKYJ, ➡️ supergeneralization theoristCollin Burns @CollinBurns4
11K Followers 276 Following Superalignment @OpenAI. Formerly @berkeley_ai @Columbia. Former Rubik's Cube world record holder.Aakanksha Chowdhery @achowdhery
7K Followers 3K Following LLMs @ Google DeepMind :: PaLM, Gemini // Previously @MSFTResearch, @Stanford, @Princeton // views my own and subject to changeZi Wang, Ph.D. @ziwphd
1K Followers 178 Following Research Scientist @ Google DeepMind. CS PhD @ MIT CSAIL. Opinions my own. https://t.co/UJRqylN2DfJascha Sohl-Dickstein @jaschasd
19K Followers 626 Following Member of the technical staff @ Anthropic. Most (in)famous for inventing diffusion models. AI + physics + neuroscience + dynamics.Chi Jin @chijinML
2K Followers 186 Following Assistant Professor @Princeton. Researcher on theoretical foundations of machine learning, reinforcement learning, games and optimization.Danqi Chen @danqi_chen
13K Followers 704 Following Assistant professor @princeton_nlp @princetonPLI @PrincetonCS. Previously: @facebookai, @stanfordnlp, @Tsinghua_UniZhou Xian @zhou_xian_
376 Followers 76 Following PhD student in robotics & AI @CMU_robotics. (Occasionally) landscape photographer.Fei Xia @xf1280
6K Followers 696 Following Research Scientist at @GoogleDeepMind, Robot Learning, Computer Vision. PhD from @StanfordAILab @StanfordSVL, previously @Tsinghua_Uni. #AGI through EmbodimentXuechen Li @lxuechen
3K Followers 902 Following Building intelligence @xai. PhD @Stanford. Undergrad @UofT. Worked at @GoogleAI @MSFTResearch @Vectorinst. I go by Chen.RL_Conference @RL_Conference
2K Followers 240 Following Home of the first annual reinforcement learning conference. Stay tuned up for updates!koray kavukcuoglu @koraykv
8K Followers 84 Following VP of Research and Technology at Google DeepMindAlex Tomala @a__tomala
1K Followers 115 Following Research Engineer @GoogleDeepMind It’s time to ship🫡Oriol Vinyals @OriolVinyalsML
167K Followers 82 Following VP of Research & Deep Learning Lead, Google DeepMind. Gemini co-lead. Past: AlphaStar, AlphaFold, AlphaCode, WaveNet, seq2seq, distillation, TF.Ofir Press 🖋 @OfirPress
10K Followers 3K Following I build tough benchmarks for LMs and then I get the LMs to solve them. Postdoc @Princeton. PhD from @nlpnoah @UW. Ex-visiting researcher @MetaAI & @MosaicML.Saining Xie @sainingxie
14K Followers 1K Following researcher in #deeplearning #computervision | assistant professor at @NYU_Courant @nyuniversity | previous: research scientist @metaai (FAIR) @UCSanDiegoBelinda Li @belindazli
2K Followers 577 Following PhD student @MIT_CSAIL | formerly SWE @facebookai, BS'19 @uwcse | NLP, MLDemi Guo @demi_guo_
22K Followers 695 Following Co-founder & CEO @pika_labs | ex @StanfordAILab @HarvardShunyu Yao @ShunyuYao12
7K Followers 861 Following Language agents (ReAct, Reflexion, Tree of Thoughts) for digital automation (WebShop, SWE-bench, SWE-agent)Jan Leike @janleike
44K Followers 322 Following ML Researcher, co-leading Superalignment @OpenAI. Optimizing for a post-AGI future where humanity flourishes.@percyliang @_aidan_clark_ from our recently released evals repo github.com/openai/simple-…
Another triumph for Self-Play! Self-Play Preference Optimization (SPPO) has surpassed (iterative) DPO, IPO, Self-Rewarding LMs, and others on AlpacaEval, MT-Bench, and the Open LLM Leaderboard. Remarkably, Mistral-7B-instruct-v0.2 fine-tuned by SPPO achieves superior…
Data contamination is a huge problem for LLM evals right now. At Scale, we created a new test set for GSM8k *from scratch* to measure overfitting and found evidence that some models (most notably Mistral and Phi) do substantially worse on this new test set compared to GSM8k.
Nice way to expose eval doping.
Data contamination is a huge problem for LLM evals right now. At Scale, we created a new test set for GSM8k *from scratch* to measure overfitting and found evidence that some models (most notably Mistral and Phi) do substantially worse on this new test set compared to GSM8k.
We're very excited to announce our sixth Keynote Speaker: @EmmaBrunskill from Stanford University!
PASS seminar on 5/3 3pm ET! Speaker: @JacobSteinhardt from @UCBerkeley Topic: Interpretability via Decomposition Live: youtube.com/@PrincetonPLI/… Submit questions: tinyurl.com/pass-question Recordings later at: youtube.com/@PrincetonPLI
Can we rigorously understand how models learn behaviors through preference learning (RLHF, DPO)? 🤔 We look into this question and find that the training dynamics have a way of prioritizing behaviors! Paper: arxiv.org/abs/2403.18742 [1/n]
Simple recipe for improving reasoning🎉 A new paper on Iterative DPO+NLL optimization for improving CoT reasoning in tasks like gsm8k, math, etc.
🚨 Iterative Reasoning Preference Optimization 🚨 - Iterative algorithm for reasoning tasks: generate pairs & apply DPO+NLL - Improves accuracy over iterations on GSM8K, MATH, ARC & beats baselines E.g. Llama2-70B GSM8K: 55.6%->81.6% (88.7% maj32) arxiv.org/abs/2404.19733 🧵(1/5)
I spent all my PhD working on with medical data, it’s super challenging. This work is amazing and great collaboration!
Researchers at @ICepfl & @YaleMed teamed up to build Meditron, an LLM suite for low-resource medical settings. With Llama 3, their new model outperforms most open models in its parameter class on benchmarks like MedQA & MedMCQA. More details ➡️ go.fb.me/6vfi21
model = learn(data) Synthetic data is great, but it’s not data. It’s an intermediate quantity created by learn(). Data is created by people and has privacy and copyright considerations. Synthetic “data” does not - it’s internal to learn().
Baby’s first grant ⬇️ amazon.science/research-award…
This is what out-of-distribution generalization looks like!
This man is from Mongolia. He can't speak English in a conversational sense, but he can sing it
We went through the peer review process at @TmlrOrg and it was quite helpful to improve the paper (better positioning wrt prior work, fixed a bug in an equation, comparisons to ReST in terms of transfer performance). See the accepted version here: arxiv.org/abs/2312.06585…
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models paper page: huggingface.co/papers/2312.06… Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity…
Welcome @SnowflakeDB Arctic to the family of fully open LLMs! ❄️❄️
❄️Congrats to @SnowflakeDB for openly releasing Arctic!❄️ Arctic is available to all with an Apache 2.0 license! Great to see LLM360 member @AurickQ and the whole Snowflake AI Research’s team's amazing contribution to the open-source LLM community!
Self-Consistent Conformal Prediction. arxiv.org/abs/2402.07307
New paper from @OpenAI on prompt injection - it's the most detailed evaluation of the problem I've seen from them so far, and has some very interesting details Posted some of my notes on the paper on my log here: simonwillison.net/2024/Apr/23/th…
Open AI presents The Instruction Hierarchy Training LLMs to Prioritize Privileged Instructions Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow adversaries to overwrite a model's original instructions with their own malicious prompts.
Our team in FAIR (at Meta) is hiring researchers (RS & PostDoc) to work on the broad topics of text and multimodal LLMs. Location: NY, Seattle or Menlo Park for RS, and Seattle for PostDocs. PostDoc: metacareers.com/jobs/968496244… Research Scientist, AI (PhD): metacareers.com/jobs/752169417…
Llama3 was trained on 15 trillion tokens of public data. But where can you find such datasets and recipes?? Here comes the first release of 🍷Fineweb. A high quality large scale filtered web dataset out-performing all current datasets of its scale. We trained 200+ ablation…
We have just released 🍷 FineWeb: 15 trillion tokens of high quality web data. We filtered and deduplicated all CommonCrawl between 2013 and 2024. Models trained on FineWeb outperform RefinedWeb, C4, DolmaV1.6, The Pile and SlimPajama!
This work is led by two incredibly talented undergraduate students: @KhanovMax a sophomore at UW Madison and recently won the prestigious Goldwater Scholarship @top34051 spent junior & senior years with us and is now pursuing graduate study at @Stanford CS. He will be traveling…
Can we align LLMs without retraining the model (e.g. using RLHF)? Introducing 🔥ARGS🔥, a simple and powerful test-time alignment approach that leverages a reward model to "guide" your unaligned LLM in decoding time! 🧵(1/n) #ICLR2024 Paper: arxiv.org/abs/2402.01694
🧵 Thrilled to announce the #ICML RL workshop 'Aligning RL Experimentalists and Theorists'! We will have several talks and a panel delivered by a super lineup of speakers: @white_martha, @ShamKakade6, @yayitsamyzhang, Dylan Foster, Niao He, @svlevine, and @MengdiWang10. 1/3