This is interesting. You can deep fake yourself in the new Sora 2 video app if you verify your identity. I wonder if this will incorporate World verification.
This is interesting. You can deep fake yourself in the new Sora 2 video app if you verify your identity. I wonder if this will incorporate World verification. https://t.co/t9JgGsae0V
There's a lot of confusion about the time-horizon for AI models. People conflate clock time with human-equivalent time.
METR: Measures a "diverse set of multi-step software and reasoning tasks." Compares models to the time it takes humans to do the tasks. Best model: GPT-5, 2…
It's no longer clear, to the people dedicated to figuring it out, whether today's frontier models are truly more aligned or just better at faking it because they know they're being evaluated.
It's no longer clear, to the people dedicated to figuring it out, whether today's frontier models are truly more aligned or just better at faking it because they know they're being evaluated.
Another good essay by Ethan Mollick, this one on GDPval, Claude Sonnet 4.5, and agentic AI.
One important question for future research: What percentage of an occupation's tasks need to be automated before the job disappears or mutates?
People, including Mollick, keep pointing…
Another good essay by Ethan Mollick, this one on GDPval, Claude Sonnet 4.5, and agentic AI.
One important question for future research: What percentage of an occupation's tasks need to be automated before the job disappears or mutates?
People, including Mollick, keep pointing…
Sonnet 4.5 is clearly an improvement, but the numbers for parallel test-time compute should probably not be compared to those for other models shown. For this, they generate multiple attempts, run them against tests, discard stuff that doesn't work, score the best of the…
Another implication of GDPval is that the METR long horizon chart seriously underestimates time horizons on tasks beyond the coding tasks they evaluate. GDPval tasks take humans an average of seven hours and some took weeks. Yet models are close to 50% on win rate.
Interesting insight into ChatGPT Pulse. They get their best models to think more deeply about things of interest to you overnight.
This is a great feature for users, and also, I assume, a nice way to add higher value at lower cost for OpenAI. The cost of doing this at night,…
Interesting insight into ChatGPT Pulse. They get their best models to think more deeply about things of interest to you overnight.
This is a great feature for users, and also, I assume, a nice way to add higher value at lower cost for OpenAI. The cost of doing this at night,…
These papers do show that current models have clinical reasoning weaknesses, however, we also have evidence from real-world deployment (Penda; 40,000 patient visits, 16% fewer diagnostic errors, 13% fewer treatment errors) and real-world benchmarks (HealthBench; 5,000 realistic…
These papers do show that current models have clinical reasoning weaknesses, however, we also have evidence from real-world deployment (Penda; 40,000 patient visits, 16% fewer diagnostic errors, 13% fewer treatment errors) and real-world benchmarks (HealthBench; 5,000 realistic…
I'm generally aligned with the idea here that LLMs give us a head start, just as our DNA gives our brains a head start. But I don't agree that the primary source of rich data from here forward is going to be currently private data. I think it's almost certainly going to be data…
I'm generally aligned with the idea here that LLMs give us a head start, just as our DNA gives our brains a head start. But I don't agree that the primary source of rich data from here forward is going to be currently private data. I think it's almost certainly going to be data…
Started listening to the @dwarkesh_sp podcast episode with Sutton and it strikes me that this isn’t a technical discussion so much as an epistemological one.
So far, it feels like Sutton believes only learning in the real world can yield a true model of the world. But this…
I wonder why OpenAI made GPT-5 so relentlessly helpful. A “do you want me to X?” at the end of responses maybe 25% of the time, when warranted, might be okay. But getting it every time, every request, feels like an interrogation, or like it’s pathologically compelled.
.@nlw good show today on Pulse. One thing that’s important is its ability to make tangential connections. E.g. I’m evaluating AI tools for work and it gave me a suggestion for a stepped-wedge trial. Minor, but imagine scientists getting useful tangential suggestions.
ChatGPT Pulse day three. Today what I noticed is that in addition to news, it’s trying to help me improve in areas it thinks I’d appreciate. Yesterday I was making charts and graphs. Today I got three separate Pulse items with tips for how to improve those data visualizations.
269 Followers 493 FollowingApplied ML & AI in Healthcare | Research Methods & Statistics | Behavioural Science | Implementation Science | Networks/Complex Systems
PhD, MBA 🇨🇦
193 Followers 566 FollowingAdversary Emulation | Cyber Threat Intelligence | AI x Cyber | Cyber x AI | e/acc | Exploring autonomous AI for offensive and defensive cyber operations |
19K Followers 1K FollowingConsultor de IA, fundador da Co.Inteligência. Professor de Comunicação e Jornalismo @Insper. Antes: @ICFJ, @marshallproj, @gizmodobr. BSB-SP-NYC-CWB-SP.
463 Followers 5K FollowingAI Solopreneur | I help small businesses,Entrepreneurs,Working Professionals, Students to increase efficiency, productivity and growth with the power of AI! |
134K Followers 1 FollowingClaude is an AI assistant built by @anthropicai to be safe, accurate, and secure. Talk to Claude on https://t.co/ZhTwG8dz3D or download the app.
25K Followers 717 FollowingMember of the technical staff @ Anthropic. Most (in)famous for inventing diffusion models. AI + physics + neuroscience + dynamics.
19K Followers 100 FollowingMember of Technical Staff at Anthropic AlphaGo, AlphaZero, MuZero, AlphaCode, AlphaTensor, AlphaProof Gemini RL Prev Principal Research Engineer at DeepMind
1K Followers 141 FollowingProduct lead for ChatGPT search and ChatGPT Pulse. Former Head of Product @ https://t.co/h3TIy4SJ95, product at https://t.co/AO9Z6nHoir. New Yorker in SF.
19K Followers 4K FollowingAI for Life Sciences & Healthcare @NVIDIA | trained as a scientist from @JGI, @UW, @UCBerkeley | building global community @TechBi0 | views are all mine
29K Followers 431 FollowingProfessor, CS, U. British Columbia. CIFAR AI Chair, Vector Institute. Sr. Advisor, DeepMind | ML, AI, deep RL, deep learning, AI-Generating Algorithms (AI-GAs)
7K Followers 69 FollowingThe Model Context Protocol (MCP) is an open protocol that enables seamless integration between LLM applications and external data sources and tools.
2K Followers 1K FollowingEmergency Physician - East Texas, home is NYC, Tour Medical Dir - Houston, Dallas Symphonies, Ex-Regional Director Envision, Juilliard-trained Violinist, RE/EVs
2K Followers 2K FollowingLive-tweeting the cyberpunk future. I was born in the late Holocene, and I've seen some shit.
Founder&CTO @RossumAi, AlphaGo baseline pachi, early git, oss...
761 Followers 73 FollowingScience-fiction author and AI alignment researcher at MIRI. https://t.co/Y3ZJQhJPdQ
Author of Red Heart (coming 11/03/2025) and Crystal Society.
Husband of @haven_emme
12K Followers 1K FollowingAGI research @DeepMind. Ex cofounder & CTO @vicariousai (acqd by Alphabet) and @Numenta. Triply EE (BTech IIT-Mumbai, MS&PhD Stanford). #AGIComics
27K Followers 8K FollowingSSRN’s mission is to rapidly share early-stage research and empower global scholarship to help shape a better future. Submit your work at https://t.co/i1a0Vzu9tR.
5K Followers 302 FollowingResearches the Economics of Transformative AI
Professor of Economics @UVAEcon & @DardenMBA
Visiting Fellow @Brookings
Research Associate @nberpubs and @cepr_org
No recent Favorites. New Favorites will appear here.