Today we’re introducing GDPval, a new evaluation that measures AI on real-world, economically valuable tasks.
Evals ground progress in evidence instead of speculation and help track how AI improves at the kind of work that matters most.
openai.com/index/gdpval-v0
👨🔬 How to turn Claude Code into a domain specific coding agent
We ran experiments to determine the best methods for running agents like Claude Code on domain specific tasks, such as writing LangGraph code. In this blog post we dive into different techniques, and show the best…
If you want Claude to give you a better looking frontend, copy and paste this prompt:
"This frontend looks terrible. Fix in accordance to the following:
Instead of emojis, use icons. Fix the padding so every component is spaced perfectly - not too close to other components but…
We're excited to announce that we're officially re-launching MedARC! 🚀
MedARC is an open science research collective, aimed to accelerate medical AI via a collaborative science-in-the-open approach. (1/5)
Kimi's founder, Zhilin Yang's interview is out.
Again, you can let Kimi translate for you: ) lots of insights there.
mp.weixin.qq.com/s/uqUGwJLO30mR…
Several takes:
1/ Base Model Focus: K2 aims to be a solid base model. We've found that high-quality data growth is slow, and multi-modal…
357 Followers 2K Following做你所想,想你所做,知行合一,内圣外王。
Do what you think, think what you do, unite knowledge and action, be a saint inside and a king outside.
AI深度爱好者,探索和实践AGI、AIGC的应用场景。
102 Followers 2K FollowingA tech enthusiast exploring the boundless possibilities of AI,I wield the magic of AI technology, bringing the power of artificial intelligence into the real W。
1K Followers 111 FollowingA series of open-source large models from Ant Group, Ling for LLM, Ring for Reasoning LLM, Ming for MLLM. See us at inclusionAI.
15K Followers 585 FollowingDirector of Product at Google Labs. Code AI. Dive in ➡ @googlelabs, @stitchbygoogle, and @julesagent Previously @vercel, @github and @heroku
20K Followers 1K Following@OpenAI Language agents (ReAct, Reflexion, Tree of Thoughts, SWE-agent, CoALA) for digital automation (WebShop, SWE-bench, tau-bench)
925 Followers 30 FollowingAutomatically record, #transcribe , and summarize your voice conversations, whether online or in-person. Getting the most out of every #meeting .
6K Followers 395 Followingexists as 451; opinions are my own; Creator of @LANDropApp, @AthenaAGI, LMRouter; MTS @MicrosoftAI LLM training infra; Ex-@NVIDIA RISC-V security
8K Followers 100 FollowingAgent marketplace where users run daily tasks and creators run their business. Start to run your AI worker 👉https://t.co/pHDfnMaBKx
130K Followers 1 FollowingClaude is an AI assistant built by @anthropicai to be safe, accurate, and secure. Talk to Claude on https://t.co/ZhTwG8dz3D or download the app.
12K Followers 1 FollowingGo from concept to production and beyond. Kiro is an AI IDE that works alongside you to turn ideas into production code with spec-driven development.
66K Followers 1K FollowingWorld’s 🌍 largest community for B2B + AI founders, execs & VCs. ➡️https://t.co/nA1aNGxZiA. Learn to scale faster at ▶️https://t.co/waF8ZjGE1y
14K Followers 3K Followingresearch @MIT_CSAIL @thinkymachines. work on scalable and principled algorithms in #LLM and #MLSys. in open-sourcing I trust 🐳. she/her/hers
50K Followers 3K FollowingAI alignment + LLMs at Anthropic. On leave from NYU. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.