Simon Smith @_simonsmith

EVP Generative AI @klickhealth simonsmith.ai Joined August 2023

Tweets

3K
Followers

526
Following

556
Likes

1K

Simon Smith @_simonsmith

42 minutes ago

This is interesting. You can deep fake yourself in the new Sora 2 video app if you verify your identity. I wonder if this will incorporate World verification.

WIRED @WIRED

3 hours ago

This is interesting. You can deep fake yourself in the new Sora 2 video app if you verify your identity. I wonder if this will incorporate World verification. https://t.co/t9JgGsae0V

4 8 19 19K 11

0 0 1 75 0

Download Image

There's a lot of confusion about the time-horizon for AI models. People conflate clock time with human-equivalent time. METR: Measures a "diverse set of multi-step software and reasoning tasks." Compares models to the time it takes humans to do the tasks. Best model: GPT-5, 2…

0 0 0 36 0

Simon Smith @_simonsmith

2 hours ago

It's no longer clear, to the people dedicated to figuring it out, whether today's frontier models are truly more aligned or just better at faking it because they know they're being evaluated.

Marius Hobbhahn @MariusHobbhahn

3 hours ago

It's no longer clear, to the people dedicated to figuring it out, whether today's frontier models are truly more aligned or just better at faking it because they know they're being evaluated.

7 13 207 12K 38

0 0 2 55 0

Simon Smith @_simonsmith

2 hours ago

Another good essay by Ethan Mollick, this one on GDPval, Claude Sonnet 4.5, and agentic AI. One important question for future research: What percentage of an occupation's tasks need to be automated before the job disappears or mutates? People, including Mollick, keep pointing…

Ethan Mollick @emollick

3 hours ago

9 28 264 14K 117

0 0 1 84 0

Simon Smith @_simonsmith

4 hours ago

Sonnet 4.5 is clearly an improvement, but the numbers for parallel test-time compute should probably not be compared to those for other models shown. For this, they generate multiple attempts, run them against tests, discard stuff that doesn't work, score the best of the…

0 0 1 120 0

Download Image

Simon Smith @_simonsmith

6 hours ago

Lovable Cloud. I guess the Supabase boost from vibe-coding apps is coming to an end.

Lovable @lovable_dev

7 hours ago

Lovable Cloud. I guess the Supabase boost from vibe-coding apps is coming to an end.

429 491 4K 1.4M 3K

Download Video

1 0 1 113 0

Simon Smith @_simonsmith

a day ago

Another implication of GDPval is that the METR long horizon chart seriously underestimates time horizons on tasks beyond the coding tasks they evaluate. GDPval tasks take humans an average of seven hours and some took weeks. Yet models are close to 50% on win rate.

0 0 2 122 1

Download Image

Simon Smith @_simonsmith

a day ago

Interesting insight into ChatGPT Pulse. They get their best models to think more deeply about things of interest to you overnight. This is a great feature for users, and also, I assume, a nice way to add higher value at lower cost for OpenAI. The cost of doing this at night,…

Adam Fry @adamhfry

a day ago

20 7 290 29K 53

1 0 6 988 1

Simon Smith @_simonsmith

a day ago

These papers do show that current models have clinical reasoning weaknesses, however, we also have evidence from real-world deployment (Penda; 40,000 patient visits, 16% fewer diagnostic errors, 13% fewer treatment errors) and real-world benchmarks (HealthBench; 5,000 realistic…

Eric Topol @EricTopol

a day ago

14 174 564 128K 215

Download Image

0 0 3 3K 2

Simon Smith @_simonsmith

a day ago

I'm generally aligned with the idea here that LLMs give us a head start, just as our DNA gives our brains a head start. But I don't agree that the primary source of rich data from here forward is going to be currently private data. I think it's almost certainly going to be data…

⿻ Andrew Trask @iamtrask

2 days ago

75 100 948 178K 986

Download Image

0 0 1 126 0

Simon Smith @_simonsmith

a day ago

Started listening to the @dwarkesh_sp podcast episode with Sutton and it strikes me that this isn’t a technical discussion so much as an epistemological one. So far, it feels like Sutton believes only learning in the real world can yield a true model of the world. But this…

1 0 1 114 0

Simon Smith @_simonsmith

2 days ago

I wonder why OpenAI made GPT-5 so relentlessly helpful. A “do you want me to X?” at the end of responses maybe 25% of the time, when warranted, might be okay. But getting it every time, every request, feels like an interrogation, or like it’s pathologically compelled.

0 0 2 100 0

Simon Smith @_simonsmith

2 days ago

.⁦⁦@nlw⁩ good show today on Pulse. One thing that’s important is its ability to make tangential connections. E.g. I’m evaluating AI tools for work and it gave me a suggestion for a stepped-wedge trial. Minor, but imagine scientists getting useful tangential suggestions.

1 0 1 75 0

Download Image

Simon Smith @_simonsmith

2 days ago

ChatGPT Pulse day three. Today what I noticed is that in addition to news, it’s trying to help me improve in areas it thinks I’d appreciate. Yesterday I was making charts and graphs. Today I got three separate Pulse items with tips for how to improve those data visualizations.