Roger Grosse @RogerGrosse
Joined July 2015-
Tweets958
-
Followers10K
-
Following747
-
Likes2K
Full lecture slides and reading list for Roger Grosse's class on AI Alignment are up: alignment-w2024.notion.site
Papers are best thought of as one of the costs of research. To the extent that certain parts of academia incentivize publishing lots of papers, this is a special case of how they incentivize high-budget research.
Quantitative benchmarks are good when they're a catalyst for thinking, and bad when they're a substitute for thinking.
Should you do a 5-year PhD if you want to work on alignment but think transformative AI is 10 years away? Would you rather have two samples from GPT-3 or one sample from GPT-4?
New Anthropic research paper: Many-shot jailbreaking. We study a long-context jailbreaking technique that is effective on most large language models, including those developed by Anthropic and many of our peers. Read our blog post and the paper here: anthropic.com/research/many-…
New open source implementation of EK-FAC influence functions (including for language models) by @juhan_bae. github.com/pomonam/kronfl…
In a world of compute-intensive frontier models and sky-high industry salaries, why do a PhD? In the first of a series on "Adam's unpopular opinions", I argue doing a PhD is still best way for most people to develop key research skills.
@RogerGrosse +1, @IvanVendrov and I discussed this sort of self-reinforcement in absence of common knowledge in 2022
There'd be less brain drain if more schools enabled the sort of fundamental curiosity-driven research that should be academia's strength. If the only viable research style is make-number-go-up, you might as well do that at a company.
There'd be less brain drain if more schools enabled the sort of fundamental curiosity-driven research that should be academia's strength. If the only viable research style is make-number-go-up, you might as well do that at a company.
It's interesting how academic prestige rankings are self-reinforcing due to an absence of common knowledge. E.g., maybe you and I both know that Institution X isn't all it's cracked up to be, but do you know that I know that he knows...
Realizing I should have used probabilistic programming in my alignment course... It seems useful for playing around with toy examples of misalignment, assistance games, etc. And it was the missing link when I tried to tie universal induction to bread-and-butter ML.
Some good discussion of AIXI in my alignment course.
I see some claims that concerns about catastrophic AI risks "come from philosophy." This is true only in the same sense that Godel's Incompleteness Theorem is philosophy.
Dan Roy @roydanroy
45K Followers 2K Following ML / AI researcher, emphasis on theory. Research Director and Canada CIFAR AI Chair, @VectorInst Professor, @UofT (Statistics/CS)Soumith Chintala @soumithchintala
186K Followers 877 Following Cofounded and lead @PyTorch at Meta. Also dabble in robotics at NYU. AI is delicious when it is accessible and open-source.Kevin Patrick Murphy @sirbayes
42K Followers 334 Following Research Scientist at Google Brain / Deepmind. Interested in Bayesian Machine Learning.Kyunghyun Cho @kchonyc
61K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).(((ل()(ل() 'yoav))).. @yoavgo
46K Followers 2K FollowingEric Jang @ericjang11
69K Followers 3K Following physical AGI at 1X. Author of "AI is Good for You" https://t.co/eFg4WXhg0pGautam Kamath @thegautamkamath
44K Followers 505 Following Assistant Prof of CS @UWaterloo, Faculty @VectorInst, Canada @CIFAR_News AI Chair. Co-EiC @TmlrOrg. I lead @TheSalonML. Privacy, robustness, machine learning.David Pfau @pfau
22K Followers 1K Following Knowledge manifests itself in radiant dreams that shimmer like the wild sun Views are my own pfau at sigmoid dot social on 🦣 https://t.co/xqtVHHVI17 on 🦋Behnam Neyshabur @bneyshabur
18K Followers 690 Following Senior Staff Research Scientist @GoogleDeepMind, Interested in reasoning w. LLMs, traveling & backpackingFerenc Huszár @fhuszar
40K Followers 1K Following Secular Bayesian. Associate Professor in Machine Learning @Cambridge_CL. Talent aficionado at https://t.co/RbJkoLguey Alum of @Twitter, Magic Pony and @BaldertonPercy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistSander Dieleman @sedielem
50K Followers 2K Following Research Scientist at Google DeepMind. I tweet about deep learning (research + software), music, generative models (personal account).Rosanne Liu @savvyRL
33K Followers 966 Following Cofounded & running @ml_collective. Host of Deep Learning Classics & Trends. Research at Google DeepMind. DEI/DIA Chair of ICLR & NeurIPS. Writing https://t.co/IbycyGfnDRNatasha Jaques @natashajaques
25K Followers 1K Following Senior Research Scientist at @GoogleAI and Assistant Professor @uwcse. Social Reinforcement Learning in multi-agent and human-AI interactions. PhD from @MIT.Zachary Lipton @zacharylipton
59K Followers 2K Following Professor: CMU/@acmi_lab, CTO / CSO: @AbridgeHQ, Creator: @d2l_ai & https://t.co/QQt98VNLUp, Relapsing 🎷Animesh Garg @animesh_garg
21K Followers 1K Following Foundation Models for Generalizable Autonomy. Assistant Professor in AI Robotics @GeorgiaTech + @NvidiaAI. prev @Stanford @berkeley_ai @UofTCompSciMiles Brundage @Miles_Brundage
43K Followers 10K Following Policy research at @openai. I mostly tweet about AI, animals, and sci-fi. He/him. Views my own.Michael Bronstein @mmbronstein
43K Followers 4K Following #DeepMind Professor of #AI @UniofOxford / Fellow @ExeterCollegeOx / ML Lead @ProjectCETI / https://t.co/kZpGpDzYeVLelandAragao @AragaoLela80934
38 Followers 661 FollowingJohn Doe @JohnDoe24192103
1 Followers 232 FollowingToryn Klassen @TorynKlassen
4 Followers 53 Following Postdoctoral Fellow at the University of TorontoJacob @jacob_23230
4 Followers 334 FollowingClaudia Peterson @ClaudiaPet99610
17 Followers 610 FollowingZedian Xiao @XiaoZedian
3 Followers 61 FollowingElectronicsseeker @libertarian108
7 Followers 912 FollowingAlex Infanger @alexinfanger
162 Followers 672 Following Currently thinking about AI alignment and consciousness. I've also worked on theory and algorithms for Markov chains. applied math @ICMEStanford, physics @ucscGustave Eiffel mondor.. @MondoreMoussa
648 Followers 6K Following Gustave Mousa Manager principal publiques conseiller pédagogique économiques chef partie Géopolitique Républicaines de Rénovations laïque Des esprits LTDMark Jimenez @realmarkjimenez
32 Followers 195 FollowingYuquan Chen @YuquanChen_USTC
5 Followers 431 Following PhD student @ustc. Research interest: quantum computing & ML.David Hall @dlwh
2K Followers 1K Following Research Engineering Lead at @StanfordCRFM . Previously co-founder at Semantic Machines ⟶ MSFT. Lead developer of Levanter, Breeze. he/him @[email protected]Nicole Meister @nicole__meister
65 Followers 106 Following phd student @stanford, previously @princeton @VisualAILab (she/her)Fulvio Sanguigni @FulvioSanguigni
5 Followers 44 Following Phd Student at Unimore, AImageLab. Working on Generative Vision and Language Models Previously at @esa, @sapienzaromaBergen & Associates @BergenandAssoc
18 Followers 265 FollowingEric Ming @Eric04321550129
18 Followers 360 Following My name is Dr. Eric Ming and I am passionate about my job, music and football. I currently work as a doctor and practice Austin, Texas and football..Max @maxxyouu0301
9 Followers 116 Followingxiaoqun liu @Shawn_i
8 Followers 38 FollowingAndy Masley @AndyMasley
2K Followers 2K Following When the going gets weird the weird turn pro. Director of EA DCKatherine Elkins @katelelkins
337 Followers 557 Following Director, @KenyonDH | Director Integrated Program in Humane Studies @KenyonCollege | Professor of Humanities and Comparative Literaturenorvid_studies @norvid_studies
1K Followers 679 Following charts & graphs // ecology, macrohistory, evolution, complex adaptive systems, concepts & models https://t.co/RaabZobt09neo @stankneo
447 Followers 2K Following Cyberpunk Metamodernism. Searching for lcm(∞-axia). In process of becoming a hyperwrangler. CS ∩ CogSci ∩ Complex Systems.Eric R @Ryninho
59 Followers 1K FollowingBannapol Limanond @BLimanond75867
0 Followers 37 FollowingArmin Li @armin1i
570 Followers 5K FollowingGeorge Tsoukalas @gtsoukal
50 Followers 189 Following PhD student at UT Austin interested in automatic theorem proving.Shisham Adhikari @AdhikariShisham
9 Followers 210 Following UC Davis Econ PhD | Macroeconomics of Climate Change and Green TransitionPhoto Bot @Photos4World
1 Followers 36 FollowingJames Parsloe @jamesparsloe
195 Followers 5K Following ML Engineer. Trying to increase the FLOPs I have access to. Used to make computers talk at Spotify/Sonantic.DailyHealthcareAI @aipulserx
39 Followers 288 Following 🚀 Daily AI healthcare updates compiled from 100+ sources (and growing)Phelipe Siani🍥 @phelipsiani0057
139 Followers 7K Following Communicator and businessman 🔴 | Anchor @cnnbrasil 🎥 | Founding partner @albuquerquecontent 🚗 | Member @trofeostudio Lectures 👇 rafa.domingues@phsholdShu @Rainb0ish
687 Followers 6K Following The girl with broken tooth. I fancy neurosynaptic chips more than potato chips. Love reading scientific papers & procrastination. Violin.hypocrite. fan igbun winid @BunWinid
76 Followers 575 FollowingNickSfDev @NickSfDev
1 Followers 220 Followingzuxfoucault @zuxfoucault
589 Followers 4K Followingtom cunningham @testingham
1K Followers 2K Following @openai. Ex Otago, LSE, Harvard, Tel Aviv, IIES Stockholm, Facebook, twitter, integrity institute.Dan Roy @roydanroy
45K Followers 2K Following ML / AI researcher, emphasis on theory. Research Director and Canada CIFAR AI Chair, @VectorInst Professor, @UofT (Statistics/CS)Clément Canonne @ccanonne_
31K Followers 928 Following Senior Lecturer @Sydney_Uni. Postdocs @IBMResearch, @Stanford; PhD @Columbia. Converts ☕ into puns: sometimes theorems. He/him. @[email protected]Soumith Chintala @soumithchintala
186K Followers 877 Following Cofounded and lead @PyTorch at Meta. Also dabble in robotics at NYU. AI is delicious when it is accessible and open-source.Kevin Patrick Murphy @sirbayes
42K Followers 334 Following Research Scientist at Google Brain / Deepmind. Interested in Bayesian Machine Learning.Kyunghyun Cho @kchonyc
61K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).(((ل()(ل() 'yoav))).. @yoavgo
46K Followers 2K FollowingEric Jang @ericjang11
69K Followers 3K Following physical AGI at 1X. Author of "AI is Good for You" https://t.co/eFg4WXhg0pGautam Kamath @thegautamkamath
44K Followers 505 Following Assistant Prof of CS @UWaterloo, Faculty @VectorInst, Canada @CIFAR_News AI Chair. Co-EiC @TmlrOrg. I lead @TheSalonML. Privacy, robustness, machine learning.David Pfau @pfau
22K Followers 1K Following Knowledge manifests itself in radiant dreams that shimmer like the wild sun Views are my own pfau at sigmoid dot social on 🦣 https://t.co/xqtVHHVI17 on 🦋Behnam Neyshabur @bneyshabur
18K Followers 690 Following Senior Staff Research Scientist @GoogleDeepMind, Interested in reasoning w. LLMs, traveling & backpackingFerenc Huszár @fhuszar
40K Followers 1K Following Secular Bayesian. Associate Professor in Machine Learning @Cambridge_CL. Talent aficionado at https://t.co/RbJkoLguey Alum of @Twitter, Magic Pony and @BaldertonSasha Rush @srush_nlp
52K Followers 464 Following Professor, Programmer in NYC. Cornell Tech, Hugging Face 🤗 https://t.co/cZl0wTfqGzPercy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistSander Dieleman @sedielem
50K Followers 2K Following Research Scientist at Google DeepMind. I tweet about deep learning (research + software), music, generative models (personal account).Ben Recht @beenwrekt
26K Followers 365 Following optimization. machine learning. uc berkeley. I blog at https://t.co/fkJujOPsJb The world won't end.Natasha Jaques @natashajaques
25K Followers 1K Following Senior Research Scientist at @GoogleAI and Assistant Professor @uwcse. Social Reinforcement Learning in multi-agent and human-AI interactions. PhD from @MIT.Thomas G. Dietterich @tdietterich
50K Followers 505 Following Distinguished Professor (Emeritus), Oregon State Univ.; Former President, Assoc. for the Adv. of Artificial Intelligence; Robust AI & Comput. SustainabilityZachary Lipton @zacharylipton
59K Followers 2K Following Professor: CMU/@acmi_lab, CTO / CSO: @AbridgeHQ, Creator: @d2l_ai & https://t.co/QQt98VNLUp, Relapsing 🎷MilaQuebec @Mila_Quebec
31K Followers 561 Following The world's largest academic research center in deep learning — Le plus grand centre de recherche universitaire en apprentissage profond.Dwarkesh Patel @dwarkesh_sp
54K Followers 699 Following Being pretrained Host of Dwarkesh Podcast https://t.co/3SXlu7fy6N https://t.co/rEhnfYywXY https://t.co/hQfIWdM1UnxAI @xai
997K Followers 36 FollowingDavid Hall @dlwh
2K Followers 1K Following Research Engineering Lead at @StanfordCRFM . Previously co-founder at Semantic Machines ⟶ MSFT. Lead developer of Levanter, Breeze. he/him @[email protected]Erik Brynjolfsson @erikbryn
209K Followers 4K Following Director @DigEconLab Professor @StanfordHAI @SIEPR @Stanford @NBERPubs https://t.co/D2bPyxoFEfToby Shevlane @tshevl
2K Followers 1K Following Research Scientist testing AI models for new capabilities at @GoogleDeepMind. Tweeting about AI and the future.Trenton Bricken @TrentonBricken
6K Followers 2K Following Trying to figure out what makes minds and machines go "Beep Bop!" @AnthropicAIKaivalya Hariharan @KaivuHariharan
35 Followers 152 FollowingJames Zou @james_y_zou
10K Followers 59 Following @Stanford professor. Chan-Zuckerberg investigator. Sloan Fellow. AI for biotech + health. Making AI more trustworthy, reliable and human compatible.Ashwinee Panda @PandaAshwinee
944 Followers 602 Following PhD @princeton, @Cal alum, currently working on LLMsBen Edelman @EdelmanBen
112 Followers 20 Following Final-year PhD candidate at Harvard CS trying to understand AI scientifically. New to the platform formerly known as Twitter.Sherry Yang @mengjiao_yang
2K Followers 342 Following Research Scientist @GoogleDeepMind | PhD Student @UCBerkeley. Previously M.Eng. / B.S. @MIT.Orowa Sikder @OrowaSikder
1K Followers 304 Following the future could be amazing. let’s get to work | Research @AnthropicAI, ex: PhD @UCLCSJackson Kernion @JacksonKernion
2K Followers 2K Following Now: LLM researcher. Before: MIT postdoc, UC Berkeley philosophy PhD. Built https://t.co/3PWzczTzu4Zac Kenton @ZacKenton1
1K Followers 1K Following Research Scientist in AI safety at DeepMind. Views are my own and don't represent DeepMind.Thomas Woodside @Thomas_Woodside
815 Followers 205 Following Junior Fellow @CSETGeorgetown. All views expressed are my own. Previously @ai_risks, @Yale. Creator of the beet emoji (forthcoming).Bilal Chughtai 🇵�.. @bilalchughtai_
587 Followers 580 Following ai safety | mechanistic interpretability | cambridge mmathBogdan Ionut Cirstea @BogdanIonutCir2
1K Followers 3K Following Independent AGI existential safety researcher. AI alignment field building with @CEffisciences. ML academia (PhD, postdoc) in a past life.Ethan Caballero is bu.. @ethanCaballero
8K Followers 2K Following ML PhD student @Mila_Quebec ; previously @GoogleDeepMindEric J. Michaud @ericjmichaud_
1K Followers 771 Following PhD student at MIT. Trying to make deep neural networks among the best understood objects in the universe. 💻🤖🧠👽🔭🚀Shubhendu Trivedi @_onionesque
7K Followers 851 Following Cultivated Abandon. Twitter interests: Machine learning research, applied mathematics, mathematical miscellany, ML for Physics/Chemistry, books.Ari Holtzman @universeinanegg
3K Followers 2K Following PI @UChicagoCS & @DSI_UChicago, leader of Conceptualization Lab https://t.co/BVCT3zdaNV, Post-doc @Meta. We don’t really know much about language models...yet.Anthony Aguirre @AnthonyNAguirre
2K Followers 119 Following Physicist & cosmologist at UCSC. Co-Founder of Future of Life Institute, Foundational Questions Institute, and Metaculus. Apple out of the box. Pro-human.Alex Mallen @alextmallen
250 Followers 200 Following Researcher at @AiEleuther with @norabelrose, trying to understand LM alignment, knowledge, and generalization. UW CSDaanish @danishabbir
622 Followers 5K Following elk again. before: startup founder, ml eng (e.g. @nvidia), ee + english (@stanford)Kilian Haefeli @khshind
230 Followers 341 Following Exploring crevasses of Deep Learning at ETH Zurich & UofT | Previously: @Aleph__Alpha, @Logitech, and exfounder at AiricaJohannes Treutlein @j_treutlein
121 Followers 115 Following CS PhD student in AI existential safety researchTri Dao @tri_dao
18K Followers 364 Following Incoming Asst. Prof @PrincetonCS, Chief Scientist @togethercompute. Machine learning & systems.Erik Schluntz @ErikSchluntz
2K Followers 238 Following Member of Technical Staff at Anthropic Co-founder at @CobaltRobotics Co-founder at Posmetrics (acquired) GoogleX, @SpaceX, @Harvard EE '15, Forbes 30u30 '18Igor Babuschkin @ibab
44K Followers 682 Following Maybe the real AGI was the friends we made along the way. @xAIAdam Jermyn @AdamSJermyn
1K Followers 188 Following AI Interpretability & Safety @AnthropicAI. Previously at @FlatironInst @FlatironCCA, @KITP_UCSB, PhD @Cambridge_Uni, BS @Caltech.Joelle Pineau @jpineau1
10K Followers 352 Following AI researcher. VP AI Research (FAIR), @AIatMeta. Professor of Computer Science, @mcgillu. Core academic member, @Mila_QuebecNaomi Saphra @nsaphra
7K Followers 1K Following Waiting on a robot body. ML/NLP. All opinions are universal and held by both employers and family. Same username on every lifeboat off this sinking ship.Xujie Si @XujieSi
176 Followers 275 Following Assistant Professor @UofTCompSci @VectorInst @Mila_Quebec. Programming Languages, Formal Methods, Artificial Intelligence, Deep Learning. | he/himTim Roughgarden @Tim_Roughgarden
36K Followers 87 Following Head of Research @a16z. Prof @Columbia. Theoretical computer scientist. Educator. Wrote Algorithms Illuminated, 20 Lectures on Algorithmic Game Theory, etc.CIFAR @CIFAR_News
17K Followers 2K Following Extraordinary minds addressing science and humanity’s most important questions. FR: @nouvelles_CIFARNisarg Shah @nsrg_shah
643 Followers 346 Following Computer science prof @UofT. Working on algorithmic fairness, voting, and resource allocation.Palladium Magazine @palladiummag
23K Followers 98 Following Palladium is a non-partisan publication exploring the future of governance and society through international journalism, long-form analysis & social philosophy.Sang Michael Xie @sangmichaelxie
3K Followers 709 Following PhD student @StanfordAILab @StanfordNLP @Stanford advised by Percy Liang and Tengyu Ma. Prev: visiting @GoogleAI Brain, BS, MS Stanford ‘17Nicholas Turner @nicholasturner0
300 Followers 343 Following Research Scientist - ML, Mechanistic Interpretability, Neuroscience ||| Tweets do not represent the views of my employer ||| he/himLawrence Chan @justanotherlaw
920 Followers 148 Following I do AI Alignment Research. Currently at the Alignment Research Center, and on leave from my PhD at UC Berkeley’s @CHAI_berkeley.Alan Chan @_achan96_
866 Followers 1K Following PhD student @Mila_quebec || Research Scholar @GovAI_ || AI safety || 🇨🇦Mark Schmidt @MarkSchmidtUBC
2K Followers 215 Following I optimized a machine learning model once. It was terrible.Tom Brown @nottombrown
5K Followers 524 Following @AnthropicAI, GPT-3, AI alignment, robustness, etc. Cautiously optimistic.@bleepbeepbzzz @jd_pressman @ArkDavey AIXI is a perfect rolling sphere that you use to teach sincere students to think about a case of cognition where everything has in fact been fully specified; if you're an honest technical thinker, it teaches you to stop pulling out hypothetical complications that might save you.
Never think about x ↦ x - η∇L(x) (gradient descent), even as a simplification. Replace it with x ↦ (1-𝛾)x + 𝛾 argmin_{y∈X} ⟨y,∇L(x)⟩ (Frank-Wolfe; a Mann iteration) or x ↦ (1-λ)x + η argmin_{||𝚫||≤1} ⟨𝚫,∇L(x)⟩ (normalized steepest descent)
The dimensional analysis of gradient descent is odd; the unit of the gradient is "loss / weight" and it gets multiplied by the learning rate to get a delta with "weight" units, so the learning rate has unit "weight^2 / loss".
This graph makes a conference policy of allowing a maximum of 2 first author submissions seem quite reasonable 😊
Actually the accept rate decreases monotonically with number of 1st author submissions: the more prolific the first author is, the lower the quality of their paper.
Excited to share Penzai, a JAX research toolkit from @GoogleDeepMind for building, editing, and visualizing neural networks! Penzai makes it easy to see model internals and lets you inject custom logic anywhere. Check it out on GitHub: github.com/google-deepmin…
Full lecture slides and reading list for Roger Grosse's class on AI Alignment are up: alignment-w2024.notion.site
Combining SSM/RNN/EMA with attention is the way to higher quality, longer context, and faster inference! Griffin, Jamba, Zamba, and now Megalodon are great examples
How to enjoy the best of both worlds of efficient training (less communication and computation) and inference (constant KV-cache)? We introduce a new efficient architecture for long-context modeling – Megalodon that supports unlimited context length. In a controlled head-to-head…
@DavidSKrueger A similar, but distinct, challenge is to develop better dataset auditing tools. While not directly helpful to pretraing, such tools would be particularly helpful in enforcement of regulations, e.g., if a governing body wants to audit an LLM’s development process.
@DavidSKrueger Developing a better data filtering toolkit would be an obvious win here! It might also be interesting to see if we could use LLMs to edit or add new synthetic data to offset the bad effects of the harmful data.
@DavidSKrueger Let’s start with data: LLMs learn bad things because they are present in the data we train them on; filtering such bad data would obviously help with alignment but unfortunately current techniques for doing filtering are pretty crude, making data filtering not very useful!
In our new agenda paper (see @DavidSKrueger tweet for details ☝️) we discuss several great research directions for just this! Pretraining has three basic ingredients: data, objective and model design. Let’s see how we could tinker with these to make pretraining better!
If a genie were to give you a magic wand and let you change one thing about LLMs, what would you do? I don’t know about you but I would definitely ask for pretraining to be more aligned! Unfortunately magic wands don’t exist but we can still make pretraining more aligned. How?🧵
I’m super excited to release our 100+ page collaborative agenda - led by @usmananwar391 - on “Foundational Challenges In Assuring Alignment and Safety of LLMs” alongside 35+ co-authors from NLP, ML, and AI Safety communities! Some highlights below...
@DavidSKrueger @RogerGrosse I am personally hugely excited about the potential of interventions at the pretraining stage for making LLMs better aligned! Check out our agenda here and look out for the questions in the ‘grey’ boxes: llm-safety-challenges.github.io/challenges_llm…
@DavidSKrueger @RogerGrosse Finally, we could think about how the design of the model could be improved, especially with an eye towards interpretability. A good example here is (transformer-circuits.pub/2022/solu/inde…) which makes MLPs more interpretable by changing the activation function.
@DavidSKrueger @RogerGrosse We could also try and see if we can improve upon the learning objective! My own past work (arxiv.org/abs/2302.08582) shows that this is possible! Scaling, and improving on, this kind of pretraining with human feedback is an open problem with huge upside!
@DavidSKrueger @RogerGrosse While Grosse et al. is a great first work; there are tons of opportunities in designing better TDA methods; e.g. making influence functions work for supervised finetuning or reinforcement learning objectives.
@DavidSKrueger Data works in conjunction with the learning objective during model training. Training data attribution (TDA) methods (like @RogerGrosse’s influence functions arxiv.org/abs/2308.03296) help understand this conjoined effect of data+objective!
Hmm. The longer I look the weirder this call. I have worked with curious high-schoolers for 3 years now but I can't think of anyone for whom this is relevant. The focus on "ML for social impact" is prescriptive and narrow, plus it smells like mainly virtue signaling.
This part seems like it could be improved, it's focusing on the big labs, linking to uninspiring work that's outdated imo. I suggest linking to one of these instead: - lesswrong.com/posts/zaaGsFBe… - lesswrong.com/posts/2eaLH7zp…
@RogerGrosse hey Roger, I've been reading through your course. It's an interesting take and strong on Value and Cooperation! I have suggestions, if you want...