Exciting to see more work on "Game as Benchmark", which is similar to our idea of TextArena (led by @LeonGuertler) for benchmarking models on >60 games.
though you can see GM @MagnusCarlsen's comments on LLMs chess play 🔥
Exciting to see more work on "Game as Benchmark", which is similar to our idea of TextArena (led by @LeonGuertler) for benchmarking models on >60 games.
though you can see GM @MagnusCarlsen's comments on LLMs chess play 🔥 https://t.co/O7J2MqiMjZ
something we've lost in the blogification of research is that citing prior work is often just not done at all, even when said work is quite similar + already broadly adopted (in this case, TextArena). especially sad when it's a big lab steamrolling the efforts of smaller teams
something we've lost in the blogification of research is that citing prior work is often just not done at all, even when said work is quite similar + already broadly adopted (in this case, TextArena). especially sad when it's a big lab steamrolling the efforts of smaller teams
Excited to announce the Mindgame @NeurIPS Competition is officially LIVE!
🤖 Pit your agents against others in Mafia, Codename, Prisoner’s Dilemma, Stg Hunt, and Colonel Blotto.
Sign up now for $500 in compute credits on your initial run!
🔗 Register : mindgamesarena.com
For the past ~2 months we have been working on training reasoning models on TextArena games. The first paper (introducing what we think is a very promising new paradigm) will hopefully be up later this week / early next; and the second one, focusing on the "scaling laws" of…
TextArena is live on arXiv! We present a benchmark of 57+ competitive text-based games to evaluate and train LLMs on agentic behavior — including negotiation, deception, theory of mind and many more. Real-time TrueSkill. Multiplayer support. Human-vs-models. Model-vs-model.…
@karpathy Perfect timing, we are just about to publish TextArena. A collection of 57 text-based games (30 in the first release) including single-player, two-player and multi-player games. We tried keeping the interface similar to OpenAI gym, made it very easy to add new games, and created…
387 Followers 2K FollowingMath and AI, working on a product for newcomers to code and trying to explain the logarithm
Prev @nomic_ai @ItsArthurAI, MS @HarvardGSAS, teacher @ALAcademy
2K Followers 2K FollowingSolo entrepreneur passionate about AI and search tech. Building a niche search product and sharing what I learn along the way.
422 Followers 4K FollowingLove Sports and Technology. Forever in a pursuit to become a 100x engineer. ( a lot of my retweets are for me to read later) | all personal views |
59K Followers 133 FollowingWe make tinygrad and sell tinybox, the best perf/$ AI computer.
$25k for 4x 5090 in a quiet box.
Our mission is to commoditize the petaflop.
9K Followers 865 Followingmts @ openai |
cs phd @ 🌁 uc berkeley |
building @vllm_project |
machine learning system |
the real agi is the friends we made along the way
24K Followers 249 Following@Cohere's research lab and open science initiative that seeks to solve complex machine learning problems. Join us in exploring the unknown, together.
79K Followers 1 FollowingDemocratizing AI research, education, and technologies. Learn how to build with AI in our new AI Academy: https://t.co/zQXQt0Pem8
49K Followers 9K FollowingI lead @Cohere_Labs. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, ML reliability. Changing spaces where breakthroughs happen.
471K Followers 290 FollowingSingapore-based digital media company focusing on news, current affairs, and entertainment for tomorrow's generations.
We turned 10 in 2023 🚀
169 Followers 713 FollowingL&D consultant. Passionate about AI, consciousness, life, intelligence, and the universe. Here to find and repost good food for thoughts
520 Followers 785 Following1st Year PhD Student, supervised by @shi_weiyan | Incoming intern in @OrbyAI | MRes and BSc Student @EdinburghNLP | Member of @CohereForAI
51K Followers 1K FollowingSincere poster. No cynicism. Dad to two sets of twins!
- https://t.co/yL0V3eZKDL
- https://t.co/wIdhAlsrlX
- https://t.co/hM9ogEIevT
- @MostlyTechPod
20K Followers 53 FollowingSingapore's lead public sector agency that spearheads economic oriented research to advance scientific discovery and develop innovative technology
346K Followers 1K FollowingDeepMind Research Scientist. Opinions my own. Inventor of GANs. Lead author of https://t.co/M6vl8pEQ4I Founding chairman of @pubhealthaction
24K Followers 1 Followingcovering the latest AI & LLM research /// see "highlights" for all previous weekly threads /// building the best AI paper search engine @findmypapersai
No recent Favorites. New Favorites will appear here.