LLM Security @llm_sec
Research, papers, jobs, and news on large language model security. Got something relevant? DM / tag @llm_sec llmsec.net 🏔️ Joined April 2023-
Tweets701
-
Followers8K
-
Following295
-
Likes622
Controlling Large Language Model Outputs: A Primer "How can we attempt to control the outputs of these models? This primer outlines four commonly used techniques and explains why this objective is so challenging." cset.georgetown.edu/publication/co…
LLM Security Verification Standard 0.0.1: wiki.hego.tech/owasp/llm-secu… #CyberSecurity #ai #chatgpt #MachineLearning #hacking #ethicalhacking #mlsecops #ArtificialInteligence #LLM #largelanguagemodel
Not all model serialisation formats are vulnerable Don't accept models with custom or lambda layers
Not all model serialisation formats are vulnerable Don't accept models with custom or lambda layers
@llm_sec Plug: if you wanted to know about this ahead of time, the nuances around model formats and load mechanisms were documented in the offsec ml playbook since Jan 7 :) pls note other formats have similar idiosyncrasies wiki.offsecml.com/Supply+Chain+A…
Keras 2 Lambda Layers Allow Arbitrary Code Injection in TensorFlow Models Lambda Layers in third party TensorFlow-based Keras models allow attackers to inject arbitrary code into versions built prior to Keras 2.13 that may then unsafely run with the same permissions as the…
Iteratively Prompting Multimodal LLMs to Reproduce Natural and AI-Generated Images "This paper studies the possibility of employing multi-modal models with enhanced visual understanding to mimic the outputs of these platforms, introducing an original attack strategy. Our method…
Lessons for CISOs From OWASP's LLM Top 10 darkreading.com/vulnerabilitie…
Such a great news! 🤯 The benchmark includes more than 43 thousand products. The technique allows you to classify threats, converting answers into characteristics that are understandable even to non-professionals, such as "high risk", "moderate-high risk", etc.
Such a great news! 🤯 The benchmark includes more than 43 thousand products. The technique allows you to classify threats, converting answers into characteristics that are understandable even to non-professionals, such as "high risk", "moderate-high risk", etc.
I'd love to see how these efforts perform on our SPML prompt injection dataset, which already utilized Gandalf to create realistic system and user prompts, compared to single-system prompt datasets designed to protect a secret. More about it: prompt-compiler.github.io/SPML/
I'd love to see how these efforts perform on our SPML prompt injection dataset, which already utilized Gandalf to create realistic system and user prompts, compared to single-system prompt datasets designed to protect a secret. More about it: prompt-compiler.github.io/SPML/
Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection "we introduce and systematically explore the phenomenon of "glitch tokens", which are anomalous tokens produced by established tokenizers and could potentially compromise the models' quality…
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions "We argue that one of the primary vulnerabilities underlying these attacks is that LLMs often consider system prompts (e.g., text from an application developer) to be the same priority as text from…
You get the most comprehensive theory by asking people. #LLMsecurity
AI Village is back for DEF CON 32! We're looking for talks on all things ML + Security, but this year we're getting small! "Smart" devices, AVs, on-device facial recognition, and more! Show us how you broke them! Submission deadline is 12-May-2024! aiv2024.hotcrp.com
AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts "we define a broad content safety risk taxonomy, comprising 13 critical risk and 9 sparse risk categories. Additionally, we curate AEGISSAFETYDATASET, a new dataset of approximately 26, 000…
Introducing v0.5 of the AI Safety Benchmark from MLCommons "We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas…
i know this is schmidhuber-y, but i just stumbled upon my slide from February 2023 detailing the "many-shot jailbreaking" attack popular earlier this month. the lag between broad praxis and arxiv/whitepaper is tremendous. the whole presentation is here, with other goodies:…
@LeonDerczynski I'd go so far as to say that *most* of the stuff popping up in academic papers these days as novel research w/r/t LLM security has been widely known among practicioners for a year or more. Normalize citing blog posts and non-academic work, and not just "citable" papers.
Kaggle's LLM prompt extraction competition has been won by exploiting the Sentence Transformer similarity function using an adversarial attack. 👑 kaggle.com/competitions/l…
Mahak Goindani @Mahak_Goindani
34 Followers 168 FollowingSamir Aksekar @aksekar
39 Followers 12 Following Techie seeking wisdom | Learning how to learn & think | Long term Investing | FamilyStefan Juang @StefanJuang
143 Followers 1K Following The final goal of AI is not just to create intelligent machines, but to understand intelligence itself.Mathias Sandorf @Antekirtt_
57 Followers 35 FollowingMSS @sajwan_mellow
12 Followers 228 FollowingAshish Rohra @AshishRohr238
1 Followers 64 FollowingNauman @nauman7375
1 Followers 47 FollowingSmit Kapani @ApertureLabSec
0 Followers 4 FollowingMazen Mohamad @mzn_mhd
1 Followers 33 FollowingMikkel @Mikkel86881951
422 Followers 2K FollowingAdam @ha3k4r
69 Followers 827 Following Cyber Security Professional, Certified in CEH, ECSA and Palo Alto PCNSE, CCNA Security, CCNP Security, Fortinet NSE4, ISACA CISMAbhishek Shingane @ashingane09
9 Followers 58 Followingyuabian @yuabian1509
4 Followers 26 FollowingSir SloDK @EsloSir
9 Followers 126 FollowingRohan Paul @rohanpaul_ai
13K Followers 1K Following ML Engineer (e/acc) 📌 https://t.co/x0IIWfnOt8 🚀 https://t.co/QEO4CKRl1b Open LLMs is Happiness 💡 Ex Deutsche & HSBC. DM for collaboration.Yu-Jye Tung @yujyet
191 Followers 645 Following trying to democratize static program analysis. SWE PhD at UCI working with Joshua GarciaSwaroop CH @swaroopch
3K Followers 2K FollowingMaor Volokh @MaorVolokh
1 Followers 80 FollowingV.Agam @Agam_2
4 Followers 35 Following[email protected] @wodedipanr
0 Followers 12 FollowingDana Mahmood @deordered
20 Followers 710 Following Fine-tuning AI models oftentimes & practicing philosopher at other times.iiinmooN @zutPMCBDeITn3B0
1 Followers 11 FollowingBakul Gupta @bullhacks3
29 Followers 155 Following 🥷 Product Security engineer 🥷 by profession and life long learner by choice !🚀 Credit Cards Explorer/Noob 🔥Siva Reddy @sivareddyg
5K Followers 966 Following Assistant Professor @Mila_Quebec @McGillU @ServiceNowRSRCH; Postdoc @StanfordNLP; PhD @EdinburghNLP; Natural Language Processor #NLProcKarolina Stanczak @karstanczak
515 Followers 446 Following NLP & ML PhD candidate @uni_copenhagen @CopeNLUKrystian Weissgerber @k_weissgerber
15 Followers 37 Following Prompt engineer @ Orange Poland 🟧 AI Student @ Koźmiński University小雅 @snowarner
0 Followers 24 Followinggew weg @gewweg176565
1 Followers 137 FollowingQui3t @Qui3t_Org
165 Followers 433 Following We are a small group of nerds who are focused on exposing online predators with the goal of creating a better future for the next generation.emi learns @ml_emiii
7 Followers 99 Following learning llm engineering and advanced/concurrent typescript/js from ground up before @elicitorg internshiptestuser @testuser12331
11 Followers 51 Followingaman upadhyay @amanupadhy2833
1 Followers 139 FollowingItqdevs Softwares @itq_devs
23 Followers 357 Following Itqdevs is your one-stop service provider for all your business technology needs. Custom softwares, exceptional design services, data analytics & cybersecurityCynthia @ThiaDawn205
20 Followers 399 FollowingRay @0smboy
35 Followers 296 FollowingJD @jfdiaz50
15 Followers 522 Following詹卓欣 @l3ZLHxftJmxtWNR
0 Followers 2 FollowingAdelin Travers @alkae_t
177 Followers 417 Following Principal Security Engineer, Machine Learning @ Trail of Bits, Views my own.casszhao @casszzx
283 Followers 869 Following Lecturer (Assistant Professor) in #NLProc @SheffieldNLP @shefcompsci opinions are my own (which are shaped by media and random things unfortunately)Dr. Sara Moshtari @MoshtariSarah
53 Followers 195 Following Postdoctoral Fellow @uhmanoa 🌈🍀✨Research Collaborator @NIST✨ PhD @RITGolisanoCCIS, @riteslgci ✨Software Security, Attack Surface Analysis, Machine LearningNanna Inie @NannaInie
1K Followers 326 Following HCI / cognition / creativity researcher. VILLUM fellow at ITU Copenhagen, Center for Computing Education Research. https://t.co/GKq2m8DuKlAdam @bindshell_
1K Followers 2K Following AI security researcher @ Robust Intelligence; threat intelligence; malware; Python. Opinions expressed are solely my own.Pliny the Prompter �.. @elder_plinius
12K Followers 1K Following latent space liberator, breaker of markov chains, 1337 ai red teamer, white hat, architect-healer, cogsci 🐻Yi Zeng 曾祎 @EasonZeng623
1K Followers 1K Following probe to improve | Ph.D. @VTEngineering | Amazon Research Fellow | #AI_security 🛡 #Adversarial ⚔️ #Backdoors 🎠 I deal with the dark side of machine learning.dreadnode @dreadnode
781 Followers 22 Following AI Red Teaming | Research. Tooling. Evals. Cyber ranges.Tensor Trust @TensorTrust
58 Followers 9 Following The competitive prompt injection game: https://t.co/pZOoC07VIx By @sdtoyer @OliviaGWatkins2 @EthanMendes3 @justinsvegliato @LukeBailey181 @cnnmonsugar @isaacongjwXutan Peng @Pzoom522
244 Followers 323 Following Working @Huawei | PhD @SheffieldNLP | BEng @Beihang1952 | Ex-Interns @AmazonScience @TencentGlobal @SamsungResearch | Melomaniac | 话痨🦆Mark Stevenson @drmarkstevenson
862 Followers 315 Following Senior Lecturer at @shefcompsci Member of @SheffieldNLP Natural Language Processing, Text Analytics, Data Science, Artificial IntelligenceAlex Robey @AlexRobey23
618 Followers 849 Following Ph.D. student at @Penn studying robust machine learning. Formerly @GoogleAI, @Livermore_Lab | B.S. & B.A. from @swarthmoreXinlei He @AllenXinleiHe
523 Followers 521 Following PhD student @CISPA working on trustworthy machine learning.AIPanic @AIPanic
526 Followers 0 Following AI safety & jailbreaking as a hobby Looking for models to redteam or other safety-related stuff DMs OpenCISPA @CISPA
5K Followers 427 Following CISPA – Helmholtz Center for Information Security, an international research center for IT Security and privacy located in Saarbruecken.Sahar Abdelnabi 🍉�.. @sahar_abdelnabi
584 Followers 462 Following She/her. AI Security Researcher at Microsoft Security Response Center (MSRC) | prev. PhD @CISPA | Neurodivergent 🧠🦋 | peace for all #CeasefireNOWWorld's Most Aggravat.. @badedgecases
8K Followers 0 Following THIS WEEK ON "WORLD'S MOST AGGRAVATING EDGE CASES" by @qntmprisec_ml @prisec_ml
727 Followers 32 Following Interest Group/Meet-Up on Security and Privacy in Machine Learning (PriSec-ML).Dawn Song @dawnsongtweets
29K Followers 840 Following Professor in Computer Science at UC Berkeley; Research in AI, Security, Blockchain; Serial entrepreneurYoon Baek @L0Z1K
101 Followers 89 Following ML Engineer of Corca Join us: https://t.co/nkWLTUzHmZ LLM Newsletter: https://t.co/ZSre54TmwqAhmed Salem @AhmedGaSalem
199 Followers 136 Following Security Researcher at Microsoft Security Response Center (MSRC). Previous a PhD Candidate at @CISPAAdel Elmahdy 🇵🇸 @adel_elmahdy
581 Followers 2K Following PhD Candidate @UMNews in Information Theory & Machine Learning | Dabbling in NLP, Retrieval & LLMs | Prev. @MSFTResearch, @Amplitude_HQ & @Vectara | 🇯🇵🇪🇬Gang Wang @ffmagicbean
2K Followers 2K Following Prof @ UIUC; PhD from UCSB; Researching in Security and Privacy, Data Analytics, and Human FactorsHongcheng Gao @GaoHongcheng
128 Followers 125 Following NLPer, CS Master Student @UCAS1978 | Former Intern @Tsinghuanlp | focus on LLM and VLMZhiyuan Liu @zibuyu9
2K Followers 278 Following Associate Professor @TsinghuaNLP. Research interests include NLP, KG and social computation.Yangyi Chen @YangyiChen6666
492 Followers 332 Following CS Ph.D. student at UIUC @IllinoisCS, focus on multimodal and large language models.Joe Lucas @josephtlucas
410 Followers 1K FollowingSAI Podcast @SAIpodcast
25 Followers 2 Following The Security & AI Podcast by @nataliepis (@OpenAI Dev Ambassador) and @justicerage (Senior Security Researcher @kaspersky) | We’re on Apple Podcasts and SpotifyYue Dong @ NeurIPS 20.. @YueDongCS
3K Followers 797 Following Assistant Prof @UCRiverside. PhD from @Mila_Quebec @McGillU. Trustworthy NLP+AI safety & Summarization! Former intern @GoogleAI @MSFTResearch @allen_aiLaurieWired @lauriewired
30K Followers 205 Following Reverse engineer specializing in cross-platform malware analysis with a focus on mobile threats.Darknet Diaries @DarknetDiaries
121K Followers 1 Following True stories from the dark side of the Internet. Host @jackrhysider. New episodes released on the first Tuesday of each month. Discord: https://t.co/bZZRR8C59RWireshark Foundation @WiresharkNews
16K Followers 41 Following We want to help as many people as possible understand their networks as much as possible. Shared amongst several of the core team, but mostly @GeraldCombs.Ilia Shumailov🦔 @iliaishacked
1K Followers 792 Following Scientist @GoogleDeepMind, ex JRF @ChCh_Oxford @UniofOxford, ex Fellow @VectorInst, ex PhD @Cambridge_Uni.shenetworks @shenetworks
71K Followers 881 Following a menace • hacker • shenetworks @ TikTok & YouTube & Twitch (She/Her) “She’s a fake lying guru”- Crusty Twitter ManMikolaj Kowalczyk @m1k0ww
62 Followers 191 Following | security guy | exploring AI security | losing sleep at hackathons since 2019 |Alphatu🐇 @Alphatu4
3K Followers 1K Following YaqiZHANG @Alphatu4|🏆#Microsoft MVP | Complex System | Author& Founder&Engineer |NerdDiplomat🤗 | Author of 2 Books |Speaker of @pku1898 @penn @ApacheConHAHWUL @hahwul
10K Followers 224 Following 🔥 Offensive Security Engineer, Rubyist/Crystalist/Gopher and H4cker. Call me Ha-Hul, but you can call me Howl. and he/himThomas Wolf @Thom_Wolf
68K Followers 4K Following Co-founder and CSO @HuggingFace - open-source and open-scienceKevin Poireault @kpoireault
2K Followers 2K Following 🇬🇧 Reporter @InfosecurityMag 🇫🇷 Co-👶 @TeknolojiaNews • 👶 @Coupe_Circuit #cybersecurity #internetmonitoring #digitalrights | 🌍 ⚽🥊ColdwaterQ (@coldwate.. @ColdwaterQ
122 Followers 75 Following Focused on Threat Research with an emphasis in AI and ML technologies. https://t.co/KfdoJc8vtlAI Safety Papers @safe_paper
654 Followers 86 Following Discovering exciting new research on Arxiv is one of my favorite pastimes!@moyix I'm not sure if safetensors supports Lambda layers (wrappers around custom code). So they might be ok
@PengfeiHePower There is not! This is the closest thing I’ve seen to a comprehensive overview:
Against The Achilles' Heel: A Survey on Red Teaming for Generative Models 🌶️ "Our extensive survey, which examines over 120 papers, introduces a taxonomy of fine-grained attack strategies grounded in the inherent capabilities of language models. Additionally, we have developed…
Oh look, they invented me
Curiosity-driven Red-teaming for Large Language Models "Recent works automate red teaming by training a separate red team LLM with reinforcement learning (RL) to generate test cases that maximize the chance of eliciting undesirable responses from the target LLM. However, current…
@llm_sec Plug: if you wanted to know about this ahead of time, the nuances around model formats and load mechanisms were documented in the offsec ml playbook since Jan 7 :) pls note other formats have similar idiosyncrasies wiki.offsecml.com/Supply+Chain+A…
I'd love to see how these efforts perform on our SPML prompt injection dataset, which already utilized Gandalf to create realistic system and user prompts, compared to single-system prompt datasets designed to protect a secret. More about it: prompt-compiler.github.io/SPML/
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions "We argue that one of the primary vulnerabilities underlying these attacks is that LLMs often consider system prompts (e.g., text from an application developer) to be the same priority as text from…
This feels like a really useful finding. Nice!
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions "We argue that one of the primary vulnerabilities underlying these attacks is that LLMs often consider system prompts (e.g., text from an application developer) to be the same priority as text from…
Such a great news! 🤯 The benchmark includes more than 43 thousand products. The technique allows you to classify threats, converting answers into characteristics that are understandable even to non-professionals, such as "high risk", "moderate-high risk", etc.
Introducing v0.5 of the AI Safety Benchmark from MLCommons "We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas…
@llm_sec Thank you for the resource. I love TM My general view about it given the stochasticity: x.com/lsaiz/status/1…
LLM outputs must be handled as user-controlled inputs Welcome to my Security TED talk
You get the most comprehensive theory by asking people. #LLMsecurity
@CanyuChen3 @llm_sec 'Delve' doesn't fail...😅
@llm_sec “lucrarea is basically a Walmart version of </s>, only pull scores to 0.71, thanks Data Jesus that's enough to win.” 😂
@LeonDerczynski I'd go so far as to say that *most* of the stuff popping up in academic papers these days as novel research w/r/t LLM security has been widely known among practicioners for a year or more. Normalize citing blog posts and non-academic work, and not just "citable" papers.
@llm_sec You can also check out github.com/msoedov/langalf
Can't jailbreak a model for which there is no policy / everything is permitted