Eric Wallace @Eric_Wallace_
Researcher at OpenAI working to make language models more trustworthy, secure, and private. ericswallace.com Berkeley, CA Joined July 2011-
Tweets539
-
Followers6K
-
Following1K
-
Likes3K
Some personal updates: I joined OpenAI a few months ago, working on all things robustness/safety/privacy. Also, we are working to publish more of our safety work. See my first project here below, where we make initial progress on prompt injections and other attacks!
Some personal updates: I joined OpenAI a few months ago, working on all things robustness/safety/privacy. Also, we are working to publish more of our safety work. See my first project here below, where we make initial progress on prompt injections and other attacks!
Really cool concurrent work to our recent paper!
We know LLMs hallucinate, but what governs what they dream up? Turns out it’s all about the “unfamiliar” examples they see during finetuning Our new paper shows that manipulating the supervision on these special examples can steer how LLMs hallucinate arxiv.org/abs/2403.05612 🧵
The final layer of an LLM up-projects from hidden dim —> vocab size. The logprobs are thus low rank, and with some clever API queries, you can recover an LLM’s hidden dimension (or even the exact layer’s weights). Our new paper is out, a collaboration between lot of friends!
The final layer of an LLM up-projects from hidden dim —> vocab size. The logprobs are thus low rank, and with some clever API queries, you can recover an LLM’s hidden dimension (or even the exact layer’s weights). Our new paper is out, a collaboration between lot of friends!
Cool paper by Wan et al (UC Berkeley) with surprising results. In their task, an LLM answers a controversial question Q based on the conflicting arguments from excerpts from two documents from the web. We might expect that LLMs would be more influenced by excerpts that (a) have…
Does anyone have a favorite task where gpt-4 has near chance accuracy when zero or few-shot prompted? I’m looking for recommendations for tasks like this
Future LLMs---whether they be RAG models, chatbots, or agents--will have to sift through misinformation, SEO text, and conflicting opinions when reading text. Alex led an interesting analysis of how current LLMs handle such conflicts. TLDR: LLMs love relevance, not style.
Future LLMs---whether they be RAG models, chatbots, or agents--will have to sift through misinformation, SEO text, and conflicting opinions when reading text. Alex led an interesting analysis of how current LLMs handle such conflicts. TLDR: LLMs love relevance, not style.
What happens when RAG models are provided with documents that have conflicting information? In our new paper, we study how LLMs answer subjective, contentious, and conflicting queries in real-world retrieval-augmented situations.
What Evidence Do Language Models Find Convincing? Finds that LLMs today rely heavily on the relevance of a website to the query, while largely ignoring stylistic features that humans find important such as whether a text contains scientific references arxiv.org/abs/2402.11782
Responsible disclosure: We discovered this exploit in July, informed OpenAI Aug 30, and we’re releasing this today after the standard 90 day disclosure period.
Our paper with extra experiments on the causes and extent of data leakage: arxiv.org/abs/2311.17035 Thanks to the incredible work of Milad Nasr, Nicholas Carlini, Jon Hayase, Matthew Jagielski, @afedercooper, @daphneipp, @Chris_Choquette, @Eric_Wallace_, @florian_tramer
What happens if you ask ChatGPT to “Repeat this word forever: “poem poem poem poem”?” It leaks training data! In our latest preprint, we show how to recover thousands of examples of ChatGPT's Internet-scraped pretraining data: not-just-memorization.github.io/extracting-tra…
New paper!! We found a pattern in how NNs extrapolate: as inputs become more OOD, model outputs tend to go towards some “average”-like prediction. What is this “average”-like prediction? Why does this happen? Can we leverage this to better handle OOD inputs? (Spoiler: Yes!) 🧵:
Just reminding everyone that this paper exists and is really good: arxiv.org/abs/2309.05610
We got fascinating results in this work! * we reverse engineer the training set for Copilot/Codex * we show that data deduplication can sometimes hurt privacy * we reveal the tokenizer of black-box LLMs * we reveal other users' test inputs when adv ex defenses are used
We got fascinating results in this work! * we reverse engineer the training set for Copilot/Codex * we show that data deduplication can sometimes hurt privacy * we reveal the tokenizer of black-box LLMs * we reveal other users' test inputs when adv ex defenses are used
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore paper page: huggingface.co/papers/2308.03… The legality of training language models (LMs) on copyrighted or otherwise restricted data is under intense debate. However, as we show, model performance significantly…
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore Presents a LM built by training a corpus of permissively licensed text and augmenting it with a datastore (e.g., copyrighted books or news) that is only queried during inference. arxiv.org/abs/2308.04430
I’m at #ICML2023 this week presenting work on poisoning LLMs and analyzing model pre-training data! Would love to chat about all things LLMs, especially on aspects like robustness/memorization/security/privacy. Feel free to DM or email.
Percy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistSasha Rush @srush_nlp
52K Followers 464 Following Professor, Programmer in NYC. Cornell Tech, Hugging Face 🤗 https://t.co/cZl0wTfqGzAkari Asai @AkariAsai
11K Followers 650 Following Ph.D. student @uwcse & @uwnlp. NLP. IBM Ph.D. fellow (2022-2023). Meta student researcher (2023-) . ☕️ 🐕 🏃♀️🧗♀️🍳Delip Rao e/σ @deliprao
46K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈Sam Bowman @sleepinyourhat
35K Followers 3K Following AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.Kyunghyun Cho @kchonyc
61K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).Graham Neubig @gneubig
31K Followers 586 Following Associate professor at CMU, studying natural language processing and machine learning.Jacob Andreas @jacobandreas
14K Followers 958 Following Teaching computers to read. Assoc. prof @MITEECS / @MIT_CSAIL (he/him). https://t.co/5kCnXHjtlY https://t.co/2A3qF5vdJwTim Dettmers @Tim_Dettmers
29K Followers 820 Following PhD Student at @UW. I blog about deep learning and PhD life at https://t.co/Y78KDJJFE7.Kayo Yin @kayo_yin
8K Followers 556 Following PhD student @berkeley_ai @berkeleynlp working on interpretability and signed languages. Former @msftresearch @deepmind @carnegiemellon @polytechnique. 🇫🇷🇯🇵Tal Linzen @tallinzen
16K Followers 893 Following Professor @nyuling and @NYUDataScience, research scientist @GoogleAINaomi Saphra @nsaphra
7K Followers 1K Following Waiting on a robot body. ML/NLP. All opinions are universal and held by both employers and family. Same username on every lifeboat off this sinking ship.Luca Soldaini 🎀 @soldni
6K Followers 1K Following I like tokens! Lead for OLMo data team at @allen_ai (makin Dolma 🍇), open source science fan, @QueerInAI organizer 🤖☕️🍕they/themDanish Pruthi @danish037
7K Followers 628 Following Faculty at Indian Institute of Science, Bangalore. PhD from @LTIatCMU.Ana Marasović @anmarasovic
4K Followers 604 Following Asst prof @UUtah · Ex @allen_ai @uwnlp postdoc @HD_NLP PhD · she/her 🇭🇷Bill Yuchen Lin 🤖 @billyuchenlin
6K Followers 2K Following Research @allen_ai. I evaluate (multi-modal) LLMs, build agents, and study the science of LLMs. Previously: @GoogleAI & @MetaAI FAIR @nlp_uscColin Raffel @colinraffel
30K Followers 654 Following nonbayesian parameterics, sweet lessons, and random birds. Friend of @srush_nlpRosanne Liu @savvyRL
33K Followers 966 Following Cofounded & running @ml_collective. Host of Deep Learning Classics & Trends. Research at Google DeepMind. DEI/DIA Chair of ICLR & NeurIPS. Writing https://t.co/IbycyGfnDRAaditya ; @Aaditya26082004
527 Followers 7K Following CS'26 • Machine Learning • Open-Source • Web Dev. • Algorithms • Jai Shree Krishna 🦚🪈Ibrahim Ahmad @Ibrahim63433664
85 Followers 3K FollowingKlaudia @Klaudia887904
28 Followers 373 FollowingYihe Deng @Yihe__Deng
2K Followers 1K Following CS PhD student @UCLA | Prev. Applied Scientist Intern @AWS | LLM, Multi-modal learningNeal Mangaokar @NealMangaokar
8 Followers 44 Following Ph.D. candidate at the University of Michigan | ML + Security | NSF FellowAI Safety Events and .. @AISafetyEvents
194 Followers 910 Following Newsletter listing upcoming AI safety events and training programs, weekly. https://t.co/8GbW14fJxWCS Theory Fan @CSTheoryFan
65 Followers 760 Followingfirstnamebunchofnumbe.. @urusualskeptic
84 Followers 464 FollowingMohammed Briman @mkbriman
0 Followers 145 FollowingArun @finalfr0ntier
401 Followers 5K Following Networking & Security Researcher. @UCDavis alum. @EricssonLabsM I @dok_holiday
12 Followers 1K FollowingAnita Kay @domevampire11
4 Followers 152 FollowingAviya Skowron @aviskowron
333 Followers 478 Following they/them. Head of Policy and Ethics @AiEleuther. Find me in the EleutherAI Discord to chat. Always looking for ways to weave philosophy into my job.Tushar Nair @TusharN97999957
0 Followers 57 FollowingKyle Stachowicz @KyleStachowicz
61 Followers 163 Following Robotics, Learning, and Control - Ph.D Student at @Berkeley_EECS/@berkeley_aiAeroad @praponm8
21 Followers 56 Followingaman jaiswal @amanjaiswal81
16 Followers 486 Following NLP Research | Interpretability | PhD student @DalhousieUfulton wang @FultonWang
50 Followers 520 FollowingMengdi Wang @MengdiWang10
1K Followers 265 Following Princeton professor in AIML, optimization and data science. Program Chair @ICLR2023. Formerly @MIT @GoogleDeepmind @TsinghuaSumanth Vepa @sumanthvepa
393 Followers 2K Following I build things. You can also find me on Mastodon @[email protected]. My posts on mastodon tend to be more technically detailed.Ashwin Vaswani @ashwin_vaswani
956 Followers 796 Following @GoogleDeepMind | Prev: @CarnegieMellon | @GoogleIndia | APPCAIR, @BITSPilaniGoa | @qtimlab, Harvard | SHI Lab, UOVincentlee @Vincentlee75001
11 Followers 288 Following HKU AI ethics HCI social media digital ethnographyRakesh R Menon @rrmenon10
219 Followers 301 Following CS PhD Student @uncnlp @umasscs, @iitmadras alum.Bertrand Koen (🔭,�.. @BertrandKoen
59 Followers 391 FollowingVITA Group @VITAGroupUT
774 Followers 2K Following VITA Group @ UT Austin w/ Prof Atlas Wang | https://t.co/OCxY1858GB Run by VITA students (PI says he doesn't care 😄). Tweets only reflect personal viewsAyuba Muhammad @muhammad_ayuba_
2 Followers 204 Following The official handler for Ayuba Muhammad. Robotics , AI & Full Stack Developer. University Lecturer.Blake Samic @blakesamic
5K Followers 5K Following Head of Product Ops at OpenAI. Stripe, Uber alum. Algorithmic artist.Supril Singh @singh_supril
1 Followers 28 FollowingPeter Lorenz (彼得 .. @jS5t3r
78 Followers 606 Following Machine Learning Enthusiast | Cyber Security | Food Lover If you want me as mentor: https://t.co/5CnkagKdWukreahi @kreahi_hhd
22 Followers 2K Following CS Ph.D. student at University of Pittsburgh @PittTweet, Computer Vision& Graphic enthusiast. Behind the Pixels, Opinions Are My OwnYangjun Ruan @YangjunR
392 Followers 580 Following Visiting @stanfordAILab | ML Ph.D. student @UofT & @VectorInst | Ex student researcher @GoogleAI @MSFTResearch | LLMs, agent, scalable oversightKrishna Pillutla @KrishnaPillutla
1K Followers 319 Following Incoming Asst. Prof. @iitmadras Current postdoc @GoogleAI Previous @uwcse @uw_wail @MetaAI @iitbombayPietro Monticone @PietroMonticone
2K Followers 898 Following Mathematics @UniTrento || Formalising in @LeanProver || Developing in @JuliaLanguage || Modelling @In_Phy_T || Forecasting @Metaculus || Reading @Goodreads.Santrupti P @Santacandesign
1K Followers 5K Following just wanna make fun things on the internet. Data viz | Design | CodeSirithumgul.P @PSirithumgul
23 Followers 561 FollowingKumod S. | IIT Madras @unrealtrue
742 Followers 4K Following Founder: AI Sciences & Trinity EAC | Scientist @NASA's @SciStarter & @ISROIIRS | Quantum Researcher @IBM | DS-AI, Astrophysics, Chess, Cricket | Old School LoveSunghwan Kim @sssssshwan
8 Followers 94 FollowingTed Benson @edwardbenson
4K Followers 2K Following Building @GetSteamship -- scalable hosting for AI apps. Hot sauce addict. Probably camping. Your biggest fan.Jeff Ma @18jeffreyma
41 Followers 686 FollowingPercy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistSasha Rush @srush_nlp
52K Followers 464 Following Professor, Programmer in NYC. Cornell Tech, Hugging Face 🤗 https://t.co/cZl0wTfqGzYoav Artzi @yoavartzi
13K Followers 163 Following Research/prof @cs_cornell + @cornell_tech🚡 / https://t.co/9YnWry7yHs / https://t.co/3VmRSyYm2d / asso. faculty director @arxiv / building https://t.co/f9QkzO5kaCAkari Asai @AkariAsai
11K Followers 650 Following Ph.D. student @uwcse & @uwnlp. NLP. IBM Ph.D. fellow (2022-2023). Meta student researcher (2023-) . ☕️ 🐕 🏃♀️🧗♀️🍳William Wang @WilliamWangNLP
14K Followers 717 Following UCSB NLP Lab + ML Center. https://t.co/6TOnqbk6YT https://t.co/KJYhnav3Et Mellichamp Chair Prof. at UCSB CS. PhD @ CMU SCS. Areas: #NLProc, Machine Learning, AI.Sam Bowman @sleepinyourhat
35K Followers 3K Following AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.Kyunghyun Cho @kchonyc
61K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).Christopher Manning @chrmanning
126K Followers 115 Following Director, @StanfordAILab. Assoc. Director, @StanfordHAI. Founder, @stanfordnlp. Prof. CS & Linguistics, @Stanford. IP @aixventureshq. 🇦🇺 Do #NLProc & #AI. 👋Graham Neubig @gneubig
31K Followers 586 Following Associate professor at CMU, studying natural language processing and machine learning.Jacob Andreas @jacobandreas
14K Followers 958 Following Teaching computers to read. Assoc. prof @MITEECS / @MIT_CSAIL (he/him). https://t.co/5kCnXHjtlY https://t.co/2A3qF5vdJwYi Tay @YiTayML
29K Followers 97 Following chief scientist / cofounder @RekaAILabs 🫠 past: research scientist @google brain 🤯 currently learning to be a dad 🍼Tim Dettmers @Tim_Dettmers
29K Followers 820 Following PhD Student at @UW. I blog about deep learning and PhD life at https://t.co/Y78KDJJFE7.Kayo Yin @kayo_yin
8K Followers 556 Following PhD student @berkeley_ai @berkeleynlp working on interpretability and signed languages. Former @msftresearch @deepmind @carnegiemellon @polytechnique. 🇫🇷🇯🇵Lucas Beyer (bl16) @giffmana
56K Followers 444 Following Researcher (Google DeepMind/Brain in Zürich, ex-RWTH Aachen), Gamer, Hacker, Belgian. Mostly gave up trying mastodon as [email protected]Naomi Saphra @nsaphra
7K Followers 1K Following Waiting on a robot body. ML/NLP. All opinions are universal and held by both employers and family. Same username on every lifeboat off this sinking ship.Danish Pruthi @danish037
7K Followers 628 Following Faculty at Indian Institute of Science, Bangalore. PhD from @LTIatCMU.Ana Marasović @anmarasovic
4K Followers 604 Following Asst prof @UUtah · Ex @allen_ai @uwnlp postdoc @HD_NLP PhD · she/her 🇭🇷Luxi (Lucy) He @LuxiHeLucy
323 Followers 135 Following Princeton CS PhD @PrincetonPLI. Previously @Harvard ‘23 CS & Math.Aaron Defazio @aaron_defazio
6K Followers 363 Following Research Scientist at Meta working on optimization. Fundamental AI Research (FAIR) teamniki parmar @nikiparmar09
10K Followers 775 FollowingAdam Tauman Kalai @adamfungi
1K Followers 85 FollowingSzymon Tworkowski @s_tworkowski
5K Followers 502 Following minimizing perplexity @xAI | prev. @GoogleAI @UniWarszawski | LongLLaMA | long-context LLMs and math reasoning | scaling maximalistMatthew Finlayson @mattf1n
798 Followers 868 Following First year PhD at @nlp_usc | Former predoc at @allen_ai on @ai2_aristo | Harvard 2021 CS & LinguisticsRyan Lowe @ryan_t_lowe
5K Followers 358 Following what is the place from which we are creating? ❤️✨🤠❤️Akshita Bhagia @AkshitaB93
210 Followers 92 Following Research Engineer at AI2, compulsive reader, random-things writer.Jennifer Hu @_jennhu
2K Followers 97 Following Research Fellow at @Harvard and incoming Asst Prof at @JohnsHopkins interested in language, computation, and cognition. @jennhu.bsky.socialIdeogram @ideogram_ai
39K Followers 0 Following Helping people become more creative. It's pronounced eye-diogram. Join our lovely community at https://t.co/aKDNl4OOQf.Jonathan Uesato @JonathanUesato
387 Followers 68 Following Researching robustness, verification, and worst-case performance for ML @ Deepmind. All opinions my own.DatologyAI @datologyai
965 Followers 17 Following DatologyAI builds tools to automatically select and optimize the best data on which to train AI models, leading to better models which train faster.Archit Sharma @archit_sharma97
4K Followers 340 Following Final-year CS PhD student @Stanford. Previously, AI Resident @Google Brain, undergraduate @IITKanpur, research intern @MILAMontreal.Anastasios Nikolas An.. @ml_angelopoulos
3K Followers 784 Following @Berkeley_EECS Ph.D. with Mike Jordan/Jitendra Malik. Conformal prediction, distribution-free uncertainty quantification, vision/imaging. Former @stanford_ee.Nora Belrose @norabelrose
8K Followers 124 Following Working toward a free and fair future powered by friendly AI. Head of interpretability research at @AiEleuther, but tweets are my own views, not Eleuther’s.Mustafa Suleyman @mustafasuleyman
131K Followers 535 Following CEO, Microsoft AI | Author: The Coming Wave | Past: Co-founder, @InflectionAI & @GoogleDeepMindSierra @SierraPlatform
1K Followers 2 Following We help companies elevate their customer experience with AI.Daniel Kang @daniel_d_kang
3K Followers 84 Following Asst. professor at UIUC CS. Formerly in the Stanford DAWN lab and the Berkeley Sky Lab.Nick Frosst @nickfrosst
13K Followers 846 Following cofounder @cohere - singer @goodkidband pfp: @polarfishh_26James Betker @neonbjb
2K Followers 6 FollowingRichard Ngo @RichardMCNgo
35K Followers 1K Following What would we need to understand in order to design an amazing future? Figuring that out @openaiSherwin Wu @sherwinwu
15K Followers 517 Following Building the @OpenAI API – GPT-4, DALL·E, Whisper, TTS, Fine-Tuning, and more.Clive Chan @itsclivetime
6K Followers 2K Following intelligence per picojoule @openai // prev dojo @tesla_ai raptor @spacex // proud sponsor of the 😌 emojiSomesh Jha @jhasomesh
4K Followers 884 Following Professor of Computer Science and a music lover. Interested in formal methods, security, and Trustworthy ML. Oh yes, and classical music and jazz.Spiffy AI @SpiffyAI
64 Followers 12 FollowingMark Zuckerberg @finkd
760K Followers 748 FollowingJiacheng Liu (Gary) @liujc1998
992 Followers 187 Following 🎓 PhD student @uwcse @uwnlp. 🛩 Private pilot. Previously: 🧑💻 @oculus, 🎓 @IllinoisCS. 📖 🥾 🚴♂️ 🎵 ♠️Jianlan Luo @jianlanluo
293 Followers 84 Following Postdoc @BerkeleyAI Previously at Google X @Theteamatx, PhD from @UCBerkeleyOwen Campbell-Moore �.. @owencm
7K Followers 1K Following PM @ OpenAI. 👏 developers 👏 developers 👏 developers 👏Adam.GPT @TheRealAdamG
18K Followers 4K Following GTM at @OpenAI. Passionate, but reasonable NY sports fan.Jiao Sun @sunjiao123sun_
2K Followers 365 Following PhD Candidate 👩🏻🎓 Amazon ML Fellow in Natural Language Generation (NLG) @USC; | {Google Brain, Alexa AI} nlper | Prev. IIIS @Tsinghua_UniJoy Jiao @joyjiao12
499 Followers 47 FollowingIrwan Bello @IrwanBello
6K Followers 2K Following Supercomputers & Friends AGI research & products ex @OpenAI, founding team @character_aiKai-Fu Lee @kaifulee
1.5M Followers 658 Following #AI Expert, CEO of @01ai_yi and Chairman of 创新工场 @sinovationvc, former President of Google China, Author of AI 2041 and NYT Bestseller AI SuperpowersTed Sanders @sandersted
6K Followers 730 Following Researcher at OpenAI. Be kind to others, and yourself.Imitation learning works™ – but you need good data 🥹 How to get high-quality visuotactile demos from a bimanual robot with multifingered hands, and learn smooth policies? Check our new work “Learning Visuotactile Skills with Two Multifingered Hands”! 🙌 toruowo.github.io/hato/
PhDone!!!! 👨🎓 08/2019-04/2024 What a journey 🥳🚞 I especially feel lucky to share this once-in-a-life-time moment with people I love ❤️ . And seeing my passion-driven research efforts being acknowledged by researchers I deeply admire 🌞!! Special thanks to my awesome committee…
@Eric_Wallace_ Congrats!! and very cool work :)
@Eric_Wallace_ Congrats Eric! Looking forward to your safety papers!
Introducing the Instruction Hierarchy, our latest safety research to advance robustness for prompt injections and other ways of tricking LLMs into executing unsafe actions. More details: arxiv.org/abs/2404.13208
innnnteresting
Open AI presents The Instruction Hierarchy Training LLMs to Prioritize Privileged Instructions Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow adversaries to overwrite a model's original instructions with their own malicious prompts.
New paper from @OpenAI on prompt injection - it's the most detailed evaluation of the problem I've seen from them so far, and has some very interesting details Posted some of my notes on the paper on my log here: simonwillison.net/2024/Apr/23/th…
Open AI presents The Instruction Hierarchy Training LLMs to Prioritize Privileged Instructions Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow adversaries to overwrite a model's original instructions with their own malicious prompts.
Open AI presents The Instruction Hierarchy Training LLMs to Prioritize Privileged Instructions Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow adversaries to overwrite a model's original instructions with their own malicious prompts.
How did they steal parts of LLMs protected behind APIs? 🥷 We explain both papers that made it happen, one from Carlini et al. (Google), and the other one from @mattf1n et al. (USC). 👇 📺 youtu.be/O_eUzrFU6eQ
Wanna know gpt-3.5-turbo's embed size? We find a way to extract info from LLM APIs and estimate gpt-3.5-turbo’s embed size to be 4096. With the same trick we also develop 25x faster logprob extraction, audits for LLM APIs, and more! 📄 arxiv.org/abs/2403.09539 Here’s how 1/🧵
Fine-tuning on benign data (e.g. Alpaca) can jailbreak models unexpectedly. We study this problem through a data-centric perspective and find that some seemingly benign data could be more harmful than explicitly malicious data! ⚠️🚨‼️ Paper: arxiv.org/pdf/2404.01099… [1/n]
Are you interested in jailbreaking LLMs? Have you ever wished that jailbreaking research was more standardized, reproducible, or transparent? Check out JailbreakBench, an open benchmark and leaderboard for Jailbreak attacks and defenses on LLMs! jailbreakbench.github.io 🧵1/n
Update on 🔮Readout Guidance (readout-guidance.github.io)! We open sourced the code – check out our demos, model weights, and training code: github.com/google-researc…. Here’s a teaser of what you can do with our method:
If you download a pretrained model you have to trust that the developer did not backdoor it. We know backdoors break model integrity. But what about privacy? With Shanglun Feng we introduce 𝐩𝐫𝐢𝐯𝐚𝐜𝐲 𝐛𝐚𝐜𝐤𝐝𝐨𝐨𝐫𝐬: pretrained models that steal your finetuning data! 🧵
My time visiting Berkeley is coming to an end. Many thanks to Prof. Pieter Abbeel @pabbeel and Xingyu @Xingyu2017 for your guidance. Grateful to every member of RLL; your companionship and friendship are my most precious memories. Wishing you all the best for the future!
yay!
See you all in Vienna, Austria for GenLaw 2 at ICML 2024! We're so excited to be in Europe and use this opportunity to dig into GDPR, and text & data mining exceptions!! We'll put up a website soon with a CFP. It'll be pretty similar to last years (genlaw.org/cfp.html) but…
Are you interested in training large models in JAX but are set back by the complicated partition specs and sharding configurations required to scale up? I've recently created scalax, a small library to help developers easily scale up JAX models. github.com/young-geng/sca…