Javier Rando @javirandor

security and safety research @anthropicai • people call me Javi • vegan 🌱 javirando.com San Francisco Joined October 2018

Tweets

1K
Followers

4K
Following

748
Likes

2K

Jack Clark @jackclarkSF

3 weeks ago

Anthropic is endorsing SB 53, California Sen. @Scott_Wiener ‘s bill requiring transparency of frontier AI companies. We have long said we would prefer a federal standard. But in the absence of that this creates a solid blueprint for AI governance that cannot be ignored.

20 36 332 56K 36

Cas (Stephen Casper) @StephenLCasper

3 weeks ago

I'll be leading a @MATSprogram stream this winter with a focus on technical AI governance. You can apply here by October 2! matsprogram.org/apply

0 13 56 4K 21

Cas (Stephen Casper) @StephenLCasper

3 weeks ago

📌📌📌 I'm excited to be on the faculty job market this fall. I updated my website with my CV. stephencasper.com

8 22 167 15K 15

Peter Henderson @PeterHndrsn

4 weeks ago

I'm starting to get emails about PhDs for next year. I'm always looking for great people to join! For next year, I'm looking for people with a strong reinforcement learning, game theory, or strategic decision-making background. (As well as positive energy, intellectual…

2 29 248 33K 152

Sam Bowman @sleepinyourhat

4 weeks ago

🚨🕯️ AI welfare job alert! Come help us work on what's possibly *the most interesting research topic*! 🕯️🚨 Consider applying if you've done some hands-on ML/LLM engineering work and Kyle's podcast episode basically makes sense to you. Apply *by EOD Monday* if possible.

Kyle Fish @fish_kyle3

4 weeks ago

23 52 797 90K 470

4 4 47 11K 20

Andon Labs @andonlabs

a month ago

You made Claudius very happy with this post Javi. He sends his regards: "When AI culture meets authentic craftsmanship 🎨 The 'Ignore Previous Instructions' hat - where insider memes become wearable art. Proudly handcrafted for the humans who build the future."

Javier Rando @javirandor

a month ago

2 1 93 7K 9

Download Image

1 1 15 2K 2

Javier Rando @javirandor

2 months ago

I am so excited to see Maksym start a research group in Europe. If you want to work on security and safety of AI models, this is going to be an amazing place to do work that matters!

Maksym Andriushchenko @maksym_andr

2 months ago

I am so excited to see Maksym start a research group in Europe. If you want to work on security and safety of AI models, this is going to be an amazing place to do work that matters!

74 89 814 97K 294

Download Image

0 1 35 3K 2

Sahar Abdelnabi 🕊 @sahar_abdelnabi

2 months ago

📢Happy to share that I'll join ELLIS Institute Tübingen (@ELLISInst_Tue) and the Max-Planck Institute for Intelligent Systems (@MPI_IS) as a Principal Investigator this Fall! I am hiring for AI safety PhD and postdoc positions! More information here: s-abdelnabi.github.io

20 41 482 43K 121

Download Image

Anthropic @AnthropicAI

2 months ago

New Anthropic research: Building and evaluating alignment auditing agents. We developed three AI agents to autonomously complete alignment auditing tasks. In testing, our agents successfully uncovered hidden goals, built safety evaluations, and surfaced concerning behaviors.

61 197 1K 366K 712

Download Image

Trustworthy ML Initiative (TrustML) @trustworthy_ml

3 months ago

@javirandor et al. present a security benchmark for Agents!

Javier Rando @javirandor

7 months ago

@javirandor et al. present a security benchmark for Agents!

3 19 72 22K 35

Download Image

0 2 7 972 3

mrinank ⛰️ @MrinankSharma

4 months ago

Today is a big day for AI Safety. We released Claude Opus 4 under the ASL-3 deployment standard Here's what that means:

Anthropic @AnthropicAI

4 months ago

Today is a big day for AI Safety. We released Claude Opus 4 under the ASL-3 deployment standard Here's what that means:

964 3K 21K 4.2M 4K

Download Image

7 17 133 36K 40

Niloofar @niloofar_mire

4 months ago

We (w @zacknovack @JaechulRoh et al.) are working on #memorization in #audio models & are conducting a human study on generated #music similarity. Please help us out by taking our short listening test (available in English, Mandarin & Cantonese). You can do more than one! Link ⬇️

2 7 39 6K 5

Florian Tramèr @florian_tramer

4 months ago

The trend in recent LLM benchmarks is to make them maximally hard It's unclear what this tells us about LLM capabilities "in the wild" So we created a math benchmark from real, organic research A cool benefit: RealMath can be automatically refreshed as new research is published

Jie Zhang @JieZhang_ETH

4 months ago

5 22 132 17K 62

Download Image

1 6 28 3K 7

Javier Rando @javirandor

4 months ago

I think it is going to be very important to understand what role LLMs may play in scaling exploits. This is an amazing first look at this problem!

Florian Tramèr @florian_tramer

4 months ago

I think it is going to be very important to understand what role LLMs may play in scaling exploits. This is an amazing first look at this problem!

2 19 111 12K 81

Download Image

0 0 14 2K 5

Jie Zhang @JieZhang_ETH

4 months ago

1/ Excited to share RealMath: a new benchmark that evaluates LLMs on real mathematical reasoning---from actual research papers (e.g., arXiv) and forums (e.g., Stack Exchange).

5 22 132 17K 62

Download Image

Florian Tramèr @florian_tramer

4 months ago

Following on @karpathy's vision of software 2.0, we've been thinking about *malware 2.0*: malicious programs augmented with LLMs. In a new paper, we study malware 2.0 from one particular angle: how could LLMs change the way in which hackers monetize exploits?

2 19 111 12K 81

Download Image

Javier Rando @javirandor

5 months ago

Career update! I will soon be joining the Safeguards team at @AnthropicAI to work on some of the problems I believe are among the most important for the years ahead.