Ofir Press @OfirPress
I build tough benchmarks for LMs and then I get the LMs to solve them. Postdoc @Princeton. PhD from @nlpnoah @UW. Ex-visiting researcher @MetaAI & @MosaicML. ofir.io/about Joined June 2016-
Tweets2K
-
Followers10K
-
Following3K
-
Likes3K
SWE-agent rocks
I was at LxMLS in 2017 and it was a wonderful experience! I highly recommend applying.
I was at LxMLS in 2017 and it was a wonderful experience! I highly recommend applying. https://t.co/yDnG7V26ZO
62% HumanEval at 8B is just mind blowing. The future of LM-assisted programming is going to be amazing, and it's going to get here much sooner than we expected.
62% HumanEval at 8B is just mind blowing. The future of LM-assisted programming is going to be amazing, and it's going to get here much sooner than we expected.
Huge moment for open source!
Excited to share a preview of Llama3, including the release of an 8B and 70B (82 MMLU, should be the best open weights model!), and preliminary results for a 405B model (still training, but already competitive with GPT4). Lots more still to come... ai.meta.com/blog/meta-llam…
New findings: We just evaluated Gemini 1.5 Pro on our recent benchmark that tests the impact of context size on reasoning performance - it is much better than 1.0 in long contexts! Though still falls behind GPT4. Also, CoT prompting now improves accuracy (unlike with 1.0). (1/4)
That's right! SWE-agent is both a useful tool for fixing bugs *and* a great way to evaluate language models. We show that even though Claude 3 is better than GPT-4 on many non-interactive tasks, on SWE-agent + SWE-bench GPT-4 blows Claude 3 out of the water.
That's right! SWE-agent is both a useful tool for fixing bugs *and* a great way to evaluate language models. We show that even though Claude 3 is better than GPT-4 on many non-interactive tasks, on SWE-agent + SWE-bench GPT-4 blows Claude 3 out of the water.
You can now download & run SWE-agent (on any GitHub issue) in 1 line! Check our repo for deets: github.com/princeton-nlp/… Join our Discord to hear first about updates like this: discord.gg/AVEFbBn2rH
Percy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistAkari Asai @AkariAsai
11K Followers 650 Following Ph.D. student @uwcse & @uwnlp. NLP. IBM Ph.D. fellow (2022-2023). Meta student researcher (2023-) . ☕️ 🐕 🏃♀️🧗♀️🍳Delip Rao e/σ @deliprao
46K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈Sam Bowman @sleepinyourhat
35K Followers 3K Following AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.Kyunghyun Cho @kchonyc
60K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).Tim Dettmers @Tim_Dettmers
29K Followers 819 Following PhD Student at @UW. I blog about deep learning and PhD life at https://t.co/Y78KDJJFE7.Lucas Beyer (bl16) @giffmana
56K Followers 445 Following Researcher (Google DeepMind/Brain in Zürich, ex-RWTH Aachen), Gamer, Hacker, Belgian. Mostly gave up trying mastodon as [email protected]Graham Neubig @gneubig
31K Followers 583 Following Associate professor at CMU, studying natural language processing and machine learning.Eric Jang @ericjang11
69K Followers 3K Following physical AGI at 1X. Author of "AI is Good for You" https://t.co/eFg4WXhg0pNaomi Saphra @nsaphra
7K Followers 1K Following Waiting on a robot body. ML/NLP. All opinions are universal and held by both employers and family. Same username on every lifeboat off this sinking ship.Luca Soldaini 🎀 @soldni
6K Followers 1K Following I like tokens! Lead for OLMo data team at @allen_ai (makin Dolma 🍇), open source science fan, @QueerInAI organizer 🤖☕️🍕they/them♻️ Leshem Choshen.. @LChoshen
3K Followers 550 Following 🥇 #NLProc researcher 🥈 Opinionatedly Summarizing #ML & #NLP papers 🥉 Good science #scientivism Let's pretrain together @IBMResearch & @MIT_CSAILHorace He @cHHillee
23K Followers 447 Following Working at the intersection of ML and Systems @ PyTorch "My learning style is Horace twitter threads" - @typedfemaleTal Linzen @tallinzen
16K Followers 895 Following Professor @nyuling and @NYUDataScience, research scientist @GoogleAIStella Biderman @BlancheMinerva
15K Followers 748 Following Open source LLMs and interpretability research at @BoozAllen and @AiEleuther. My employers disown my tweets. She/herAna Marasović @anmarasovic
4K Followers 603 Following Asst prof @UUtah · Ex @allen_ai @uwnlp postdoc @HD_NLP PhD · she/her 🇭🇷Colin Raffel @colinraffel
30K Followers 655 Following nonbayesian parameterics, sweet lessons, and random birds. Friend of @srush_nlpAllen @AllenTemplate
19 Followers 83 FollowingKeyon Zeng @Keyon2046
133 Followers 1K FollowingXing Han Lu @xhluca
1K Followers 208 Following Tinkering with Conversational Web Agents @Mila_QuebecChristian Moya Calder.. @chrismoya86
2 Followers 414 FollowingXiaodong Liu @AllenLao
509 Followers 253 Following Deep Learning and NLP Researcher: interested in machine learning, nlp, dog and cat. Opinions are my own.Chloe Walters @TheRobotChloe
0 Followers 9 FollowingAnil Murthy @Anil_MurthyB
7 Followers 492 FollowingHani Bdeir @HaniLabz
67 Followers 174 FollowingBerat @beratfromearth
299 Followers 2K Following software developer/mechanical engineer ai enthusiastYasir Sahibzada @YasirSahibzada5
171 Followers 2K Following Enjoying life, making my own way, and following my dreams step by step. 🌟 #DreamChaser #MemoryMakerEsther Shizgal @EstherShizgal
27 Followers 96 FollowingJonathan Wang @givemettt5600
14 Followers 179 FollowingTrevor Loy @trevorloy
17K Followers 2K Following VC investor emerging ecosystems @FlywheelVC. Lecturer entrepreneurship & VC @Stanford. Prev: BoD @NVCA; Mentor @KauffmanFellows; 3x founder; Chip design @Intel.Max @maxzpchen
233 Followers 2K Following Father, Investment Partner @factorialfunds; Love Tech, Sociology, Music, Photography;Paulo E. N. @paeduneves
151 Followers 1K FollowingINDRAJEET @indrajeet877
431 Followers 2K Following Head of Math Department,Allen Institute Karaikal BTech NITW 2012, Option trader & investor. Math geek, tech-forward, learner Plus Python & Spanish skills.erden timur oldum @ErdenOldum99749
3 Followers 36 FollowingMike Channon Ⓜ️ @XDA_Forum_Admin
6K Followers 5K Following Forum Admin at https://t.co/mFiBmgsI4b, Director at https://t.co/iH1LoXoajpJaffray Woodriff @JaffrayW
7K Followers 5K Following Chief Scientist & CEO QIM. UVa School of Data Science (Founder/Funder). Embrace Simplicity, Beware Complexity. Coding at CODE. Honesty. Sleep. Squash.Momchil Konstantinov @MPKonstantinov
54 Followers 271 Following ML/NLP practitioner, former symplectic geometerFaizan Ahemad @machine_yearn
13 Followers 163 Following Applied Scientist II @ Amazon, working in ML, CV and NLPmusk2024 @musk2024musk
67 Followers 257 FollowingJonathan Oldershaw @jon_oldershaw
1K Followers 1K Following Director, Insights @AxonComms @Madano - Data, Analytics, Communicationsgavin leech @g_leech_
4K Followers 420 Following the subject of criticism @ArbResearch, @Bristol_AI_CDT, ESPRwootie7 @glassfire77
88 Followers 99 FollowingIntellect2 @Intellect2ai
2K Followers 2K Following A software solutions company offering end-to-end advanced data solutions powered by modern #datascience and #AI technology.Generative AI @generativeaihub
7K Followers 6K Following Inspired by Algorithms, Powered by Imagination: Unleashing the Potential of Generative AI. #GenerativeAI #deeplearning #AI #MachineLearningNikhil Vyas @vyasnikhil96
235 Followers 509 Following Postdoc at Harvard ML Foundations. Previously @MITEECS.Pedro Henrique Costa .. @phbonamin
2 Followers 110 Followingxenjoyer007 @xenjoyer007
3 Followers 132 FollowingAnonymous Founder @anonymfounder
404 Followers 7K Following My startup diary. Leading the way in innovation and industry transformation. From startups to marketing, finance to entrepreneurship........and cryptocurrency.Yuvraj Virk @YuviDoes
3 Followers 78 Followingjosh @j_mcgraph
527 Followers 801 Following founding member of @datologyai | PhD student @dsg_uwaterloo | poodle enthusiastNishant Gopinath @nishgop
46 Followers 226 Following software architect & technology enthusiast; supply chain, customer experience, data, analytics and artificial intelligence/ machine learning; learner, volunteerSimran @sdm8499
4 Followers 91 FollowingAte-a-Pi @8teAPi
38K Followers 2K Following self aware neuron; historian from 2130; epistemic polluter; 95 yr old man;Hi 🇺🇸 @osrs_dds
518 Followers 3K Following Independent - here to learn, debate in good faith, and troll. Forward 🧢Mike A. Merrill @Mike_A_Merrill
91 Followers 106 Following PhD Student @uwnlp @uwcse AI for time series and health prev. @GoogleAI @Apple @health_rhythms @CornellRémi 〰️ @remilouf
6K Followers 1K Following LLMs & structured generation @dottxtai. @OutlinesOSS 〰️ . Alumni @ENS_ULM & @UniOfOxford. I wander.(((ل()(ل() 'yoav))).. @yoavgo
46K Followers 2K FollowingPercy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistAkari Asai @AkariAsai
11K Followers 650 Following Ph.D. student @uwcse & @uwnlp. NLP. IBM Ph.D. fellow (2022-2023). Meta student researcher (2023-) . ☕️ 🐕 🏃♀️🧗♀️🍳Delip Rao e/σ @deliprao
46K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈Christopher Manning @chrmanning
126K Followers 115 Following Director, @StanfordAILab. Assoc. Director, @StanfordHAI. Founder, @stanfordnlp. Prof. CS & Linguistics, @Stanford. IP @aixventureshq. 🇦🇺 Do #NLProc & #AI. 👋Sam Bowman @sleepinyourhat
35K Followers 3K Following AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.Kyunghyun Cho @kchonyc
60K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).Sasha Rush @srush_nlp
52K Followers 464 Following Professor, Programmer in NYC. Cornell Tech, Hugging Face 🤗 https://t.co/cZl0wTfqGzTim Dettmers @Tim_Dettmers
29K Followers 819 Following PhD Student at @UW. I blog about deep learning and PhD life at https://t.co/Y78KDJJFE7.Lucas Beyer (bl16) @giffmana
56K Followers 445 Following Researcher (Google DeepMind/Brain in Zürich, ex-RWTH Aachen), Gamer, Hacker, Belgian. Mostly gave up trying mastodon as [email protected]Graham Neubig @gneubig
31K Followers 583 Following Associate professor at CMU, studying natural language processing and machine learning.Yoav Artzi @yoavartzi
13K Followers 163 Following Research/prof @cs_cornell + @cornell_tech🚡 / https://t.co/9YnWry7yHs / https://t.co/3VmRSyYm2d / asso. faculty director @arxiv / building https://t.co/f9QkzO5kaCEric Jang @ericjang11
69K Followers 3K Following physical AGI at 1X. Author of "AI is Good for You" https://t.co/eFg4WXhg0pNaomi Saphra @nsaphra
7K Followers 1K Following Waiting on a robot body. ML/NLP. All opinions are universal and held by both employers and family. Same username on every lifeboat off this sinking ship.Luca Soldaini 🎀 @soldni
6K Followers 1K Following I like tokens! Lead for OLMo data team at @allen_ai (makin Dolma 🍇), open source science fan, @QueerInAI organizer 🤖☕️🍕they/them♻️ Leshem Choshen.. @LChoshen
3K Followers 550 Following 🥇 #NLProc researcher 🥈 Opinionatedly Summarizing #ML & #NLP papers 🥉 Good science #scientivism Let's pretrain together @IBMResearch & @MIT_CSAILHorace He @cHHillee
23K Followers 447 Following Working at the intersection of ML and Systems @ PyTorch "My learning style is Horace twitter threads" - @typedfemaleShashank Joshi @shashj
199K Followers 2K Following Defence editor at @TheEconomist, Visiting fellow at @warstudies King's College London.Chris Gorgolewski @chrisgorgo
7K Followers 1K Following Product manager, software engineer, researcher. Previously at: @googleanalytics, @GoogleAI, @kaggle, @StanfordPsych, @MPI_CBS. Opinions are my own.Franjo Ivancic @fivancic
331 Followers 765 Following Senior Staff Software Engineer & Manager at Google. https://t.co/GNlq6Pi68dXing Han Lu @xhluca
1K Followers 208 Following Tinkering with Conversational Web Agents @Mila_QuebecRobin Rombach @robrombach
6K Followers 397 Following Generative enthusiast and long-term PhD Student @LMU_Muenchen. Author of VQGAN, Latent Diffusion, Stable Diffusion.Harry Tormey 🇮🇪.. @htormey
6K Followers 739 Following CTO AI Startup 🔨Eliminating toil from data infra. Full stack, Python, AI, Native iOS & #ReactNative developer. EX: Eng @coinbase @apple @facebook. ☘️ 🏠 🇺🇸.שי ניב @niv_shai
15K Followers 912 Following #מהצד_השני | מגיש #החייםעצמם @GLZRadio |פעם גלובס | אבא של גלי, מיה ושירJackson (Kuan-Chieh) .. @kcjacksonwang
701 Followers 1K Following Research Scientist @Snap working on personalized generative models. | Prev: @WuTsaiAlliance, @Stanford, @UofTCompSci, @VectorInst 🇨🇦🇹🇼Tamay Besiroglu @tamaybes
3K Followers 719 Following Thinking about economics, computing and machine learning @EpochAIResearch @MIT_CSAILScott Tolinski - Synt.. @stolinski
59K Followers 2K Following Co-host of https://t.co/88mStAaGOe | @getsentryWes Bos @wesbos
376K Followers 2K Following Fullstack JS Dev ❯ https://t.co/6heZ7gZqg1 ❯ https://t.co/lOo3xh23G1 ❯ https://t.co/XYbxq79WBS ❯ Posts 🔥 Tips ❯ Co-hosts @SyntaxFMKilian Lieret @KLieret
46 Followers 20 Following Research Software Engineer at Princeton University. Probably not much around here.המכון הישרא.. @DemocracyIL
13K Followers 858 Following מוסד עצמאי א-מפלגתי, מחקרי ויישומי, הפועל בזירה הציבורית הישראלית בתחומי הממשל, הכלכלה והחברה. עקבו אחרינו גם בפייסבוק: https://t.co/wxbTNMIHhCLintang Sutawika @lintangsutawika
387 Followers 564 Following Incoming Ph.D. student @LTIatCMU. Researcher at @AIEleuther. Maintainer of LM-Eval Harness. Here for machine learning papers and discussion.Rajko Radovanović @rajko_rad
4K Followers 4K Following AI/infra @a16z (partner to amazing teams eg @MistralAI @udiomusic); Enjoy most things outdoors, care about democracy in 🇷🇸🇭🇷🇸🇮🇧🇦🇲🇪אורי צירלין.. @uricirlin1
5K Followers 435 Following ״שישחקו שוטרים נגד כלבנים״ מבין מאוד בהלכה. רק הפועל. LFC.יריב אופנהי.. @yarivop
61K Followers 1K Following מנהל קואליציית שתי המדינות ביוזמת ז׳נבה, חבר הנהלת שלום עכשיו, בנבחרת של אייס, 103FM, דמוקרטיוי, מכללת הדסה, מרצה 0544-200060Gergely Orosz @GergelyOrosz
249K Followers 2K Following Writing @Pragmatic_Eng, the #1 technology newsletter on Substack. Author of @EngGuidebook. Formerly Uber & Skype.David Rennie 任大�.. @DSORennie
42K Followers 865 Following Beijing bureau chief, Chaguan columnist, co-host Drum Tower podcast @TheEconomist. Ex-Washington, London, Brussels, Washington, Beijing, Sydney. Panelist @1aRevital Salomon @TheSharkLady
10K Followers 536 Following Owner of The Shark Lady - SEO, Website & Content Management. Loves nature. Gamer girl. Horse rider. Owned by Spike the cat and Talo the horse. 🐎🐱Tianbao Xie @TianbaoX
1K Followers 1K Following Ph.D. student of @XLangNLP lab and @HKUNLP group 2022. Advised by @taoyds and @ikekong . e/iaJon Victor @jon_victor_
4K Followers 3K Following Reporter @theinformation. Contact me on Signal: @ jon.597Erin Woo @erinkwoo
14K Followers 4K Following covering twitter, tiktok & snap @theinformation. board member @aajasf. previously @nytimes et al. tips: [email protected] or erinkwoo.07 on signalStephanie Palazzolo @steph_palazzolo
8K Followers 3K Following Writing AI Agenda @theinformation, texan, & horror movie aficionado // reach me at [email protected] or on Signal at 979-599-8091Yuhang Lai @halfrot01
12 Followers 43 Following First-year mphil @FudanNLP; Previously a research intern @xlangNLP.Shivanshu Gupta @shivanshug11
165 Followers 93 Following PhD Candidate at UC Irvine | Previously @asapp @amazon @linkedin @msftresearch @iitdelhi | #NLP & #ML ResearchJunpeng Liu @jeepliu1212
48 Followers 76 Following Ph.D. student @CUHKofficial, supervised by Prof. Wai LAM. (Multimodal) Large Language ModelOwen Campbell-Moore �.. @owencm
7K Followers 1K Following PM @ OpenAI. 👏 developers 👏 developers 👏 developers 👏Nathan Benaich @nathanbenaich
51K Followers 32K Following solo member of investment staff @airstreet, brewing ambition @airstreetcafe, next token predictor @airstreetpressDeedy @deedydas
68K Followers 4K Following Investing at @MenloVentures. Formerly founding team @glean, @Google Search. @Cornell CS. Tweets about tech, immigration, India, fitness and search.Surya Bhupatiraju @suryabhupa
1K Followers 433 Following research engineer @deepmind | previously google brain, CS @MIT, @msftresearch, @facebookSenior PowerPoint Eng.. @ryxcommar
42K Followers 3K Following Risk manager @AlamedaResearch. | Prev: QA engineer at Knight Capital Group | he/him | https://t.co/Z2YNAksZLOFlorin Pop 👨🏻�.. @florinpop1705
183K Followers 2K Following Helping developers learn faster @iCodeThis - 💻 YouTube: https://t.co/YagJ7dzAXE - 2nd Channel: https://t.co/oPvxX6f8DPJeremy @jezp55
22 Followers 49 Followingיוסי מזרחי Y.. @yosimiz1
15K Followers 287 Following כתב תחקירים ומגזין חדשות 12, מגיש התכנית חד וחלק רדיו צפון (לא נשוי לנינט ולא מאמן כדורגל)Philip Oldfield @SustainableTall
31K Followers 4K Following Head of School Built Environment @UNSWADA and Prof of #Architecture. Writer, researcher & teacher, tall buildings, #ClimateChange, #EmbodiedCarbon + cities.Yanal Jabarin ينا�.. @JbareenYanal
8K Followers 964 Following כתב וסגן עורך ב @Democrat_TV | כתב באתר @ha_makom אוהד @HapoelJLMfc ❤️🖤|לשעבר רכז פרויקט ייצוג האזרחים הערבים בתקשורת בעמותת @SikkuyAufoq & כתב במיזם #הארץ21Noah Levin @nlevin
36K Followers 3K Following VP of Product Design @Figma. I like singing and crying at the end of B movie rom coms (not usually at the same time). I tweet about feelings, design, and dogs.SWE-agent rocks
Researchers @PrincetonPLI have created an autonomous AI software engineer that’s free and open source. 💻 Called SWE-agent, it uses an LLM, like GPT-4, to automatically fix coding problems in GitHub. 🤯 It can solve problems in about 90 seconds with high accuracy
Excited to share a new blog on ML-based repair for build errors at Google! We found that automatically repairing build errors in the IDE increases productivity as measured by overall task completion with no detectable negative impact on code safety!
This graph makes a conference policy of allowing a maximum of 2 first author submissions seem quite reasonable 😊
Actually the accept rate decreases monotonically with number of 1st author submissions: the more prolific the first author is, the lower the quality of their paper.
Unlike any sane person who gets a PhD in NLP right now, afterwards I made a game. I just released it in early access talktomehuman.com Talk to NPCs who talk back at you, try to persuade your way out of sticky situations
I'm seeing a lot of questions about the limit of how good you can make a small LLM. tldr; benchmarks saturate, models don't. LLMs will improve logarithmically forever with enough good data.
Yes, both the 8B and 70B are trained way more than is Chinchilla optimal - but we can eat the training cost to save you inference cost! One of the most interesting things to me was how quickly the 8B was improving even at 15T tokens.
Consistent Diffusion Meets Tweedie. Our latest paper introduces an exact framework to train/finetune diffusion models like Stable Diffusion XL solely with noisy data. A year's worth of work breakthrough in reducing memorization and its implications on copyright 🧵
SDXL can further reconstruct training images given heavily noisy measurements. The reconstruction task is not important. What matters is that these models have memorized their training set.
Stable Diffusion XL and other state-of-the-art models memorize examples from their training sets. We discover that SDXL can reconstruct images from LAION even when whole faces or objects are missing. Row 1: Images from LAION, Row 2: Masked Input to SDXL, Row 3: Reconstruction
Llama3-8B and 70B have dropped!! Extremely grateful to have been part of this journey. More coming soon :) llama.meta.com/llama3/
Coding is the frontier of AI. Excited to push the two frontiers of AI coding: 1. SWE(-bench/agent) 2. Olympiad programming (this tweet) Introduce USACO benchmark: * inference methods (RAG/reflect) help a bit: 9->20% * human feedback helps a lot: 0->86%! princeton-nlp.github.io/USACOBench/
Can Language Models Solve Olympiad Programming? - Uses self-reflection and retrieval over episodic knowledge to boost the perf of GPT-4 on USACO from 8.7% pass@1 to 20.2% - Giving a small number of targeted hints solves most of the questions repo: github.com/princeton-nlp/… abs:…
Now that we have a code-specific version of @lmsysorg benchmark we can see how different the result from human evaluation of LLM coding capabilities is from execution based evals like HumanEval (which surprisingly does not involve humans). Turns out... not much! A thread:
remember Gemini Pro? 1M context window? @Alon_Jacoby and @mosh_levy just finished evaluating it with their FLenQA dataset, measuring simple reasoning tasks across input lengths (took a while because rate limits). How well does it do? well, as expected, not great.
New findings: We just evaluated Gemini 1.5 Pro on our recent benchmark that tests the impact of context size on reasoning performance - it is much better than 1.0 in long contexts! Though still falls behind GPT4. Also, CoT prompting now improves accuracy (unlike with 1.0). (1/4)
📝New leaderboard and blog post on LiveCodeBench, a holistic and contamination-free way to do your code evals! 🔥Includes latest models: GPT-4, Claude-3, Gemini-1.5, Command-R, DBRX, CodeQwen ✨Contributions + collaborations welcome, please reach out! 💻livecodebench.github.io
New leaderboard: LiveCodeBench! 💻 Complete code evaluations, with a great feature: problem selection by publication date 📅 This means getting model scores only on new problems out of the training data = contamination free code evals! 🚀 Blog: huggingface.co/blog/leaderboa…
✨Excited to finally drop our new paper: SSMs “look like” RNNs, but we show their statefulness is an illusion🪄🐇 Current SSMs cannot express basic state tracking, but a minimal change fixes this! 👀 w/ @jowenpetty, @Ashish_S_AI arxiv.org/abs/2404.08819
Great work on SWE-agent, Ofir! I'm working on an article about the architecture. Impressively, it looks simpler than many of us expected. I've linked to your Discord on Nopilot, and if/when you add a contributor guide I'll link to that too nopilot.dev/resources
It's been just 10 days since we launched SWE-agent but we already have 1.5k people in our Discord and lots of contributors on GitHub. We've been making the agent easier to use and there are lots more exciting updates coming soon, including a web UI! Join us :)