AIs can have secret “backdoors:” Can we uncover them?
Maybe! We study models that misbehave in an unknown situation, and successfully reverse-engineer the situation in simple settings.
We weren't sure this was possible! We hope to inspire future work in more realistic settings.
honestly a pretty crazy journey coming from being a neuroscience student 2 years ago.. gonna build something to share my process and learnings with you all :)
honestly a pretty crazy journey coming from being a neuroscience student 2 years ago.. gonna build something to share my process and learnings with you all :)
New paper from @norabelrose and I.
We show how mech interp can be done on generic relu networks--a feat previously understood to be intractable. Rather than enumerate over polytopes we OLS regress on max entropy inputs, deriving guarantees on model perf.
arxiv.org/abs/2502.01032
New paper from @norabelrose and I.
We show how mech interp can be done on generic relu networks--a feat previously understood to be intractable. Rather than enumerate over polytopes we OLS regress on max entropy inputs, deriving guarantees on model perf.
arxiv.org/abs/2502.01032
One of the things that excited me the most about pursuing a PhD in the future is finally being qualified to mentor bright undergraduate students who want a career in research. There's something beautiful about helping someone grow into who they want to be.
✨I’m on the faculty job market for 2024-2025! ✨
My research focuses on advancing Responsible AI—enhancing factuality, robustness, and transparency in AI systems.
I’m at #EMNLP2024 this week🌴 and would love to chat about research and hear any advice!
📈New paper on implicit language and context!
She bought the largest pumpkin? - Largest pumpkin out of what? All pumpkins in the store? Out of all pumpkins bought by her friends? In the world?
Superlatives are (often) ambiguous and their interpretation is extremely context…
going down the rabbit hole of papers and blogs to inform myself, finding some rly cool stuff.
today gonna setup a notion for technical blogging in general + setup the repo and outlining next steps :]
going down the rabbit hole of papers and blogs to inform myself, finding some rly cool stuff.
today gonna setup a notion for technical blogging in general + setup the repo and outlining next steps :]
think im gonna work on an open source toolkit for evaluating models from the perspectives of bias, safety, etc.
pretty sure this has been done before but i want to use it as a chance to implement different papers on these topics for identification of misaligned model behavior
393 Followers 2K FollowingHMC 2022 | https://t.co/a8gKmKdpLh | Pretengineer | Aspiring math professor | Everyone can learn math, given time, resources, and help | He/Him/His
170 Followers 346 FollowingBrain wired a little left of center. Vet, tinkerer, and idea-chaser. I build strange machines, explore impossible math, and sometimes accidentally find stuff..
5 Followers 117 FollowingPhilosophy, theology, technology, design, photography, and whatever else I feel like sharin. No theme, just things I find interesting. My personal dump account.
96 Followers 520 Following#GreatSoros D 1⃣
I Felt in love with Financial markets because 🆓 Market situation is Always good Driving force Of Pricing of commodities and Prior Indicator.
75 Followers 1K FollowingHow much do you Plan ?
How frequently you Ship?
Underachiver Curious Soul of Science Commerce highbreed before NEP
Accounting-Auditing-forecasting
31 Followers 565 Following💻 Full Stack Web Dev Student |
🌐 Exploring Web3 & AI |
🚀 Building the future, one project at a time | Lifelong Learner |
#TechEnthusiast #WebDev #AI #Web3
38 Followers 199 Followingयोगः कर्मसु कौशलम् — Chasing the edge of intelligence. Shaping minds that will shape a world where AI serves humanity, not replaces it.
20'
1.9M Followers 27K FollowingYes, I can see some risk that your threat to jail Internet company executives for not censorsing aggressively enough could backfire.
3K Followers 416 Following✨ asking sand to show its work @GoodfireAI // deep learning, math, biology // creating a more beautiful future // (opinions my own)
393 Followers 2K FollowingHMC 2022 | https://t.co/a8gKmKdpLh | Pretengineer | Aspiring math professor | Everyone can learn math, given time, resources, and help | He/Him/His
9K Followers 20 FollowingAdvancing humanity's understanding of AI through interpretability research. Building the future of safe and powerful AI systems.