Graham Neubig @gneubig
Associate professor at CMU, studying natural language processing and machine learning. phontron.com Pittsburgh, PA Joined September 2010-
Tweets3K
-
Followers30K
-
Following582
-
Likes3K
Dear #NAACL Members, to better explain some of the arguments for a possible name change, several members of our community who reside or originate from the Americas outside of US/Canada have written an open letter 👉 naacl.org/posts/2024-04-… Original survey forms.gle/r8SWiu8goG79kw…
After receiving community feedback, we added @GoogleDeepMind Gemini 1.5 Pro's results. 👇 Gemini 1.5 Pro's vision ability was significantly improved compared to 1.0 Pro and matched GPT-4's performance on our VisualWebBench! 🏆 Its action prediction (e.g., predicting what would…
After receiving community feedback, we added @GoogleDeepMind Gemini 1.5 Pro's results. 👇 Gemini 1.5 Pro's vision ability was significantly improved compared to 1.0 Pro and matched GPT-4's performance on our VisualWebBench! 🏆 Its action prediction (e.g., predicting what would… https://t.co/kQnZzztfEh
How to enjoy the best of both worlds of efficient training (less communication and computation) and inference (constant KV-cache)? We introduce a new efficient architecture for long-context modeling – Megalodon that supports unlimited context length. In a controlled head-to-head…
Good to know! But it's still a very nice benchmark.
Good to know! But it's still a very nice benchmark.
There are several LLM benchmarks for web agents, but agents are not the only web application of LLMs. What about more fine-grained web-page understanding? Our new benchmark VisualWebBench evaluates LLMs on abilities such as OCR, QA, identifying DOM elements, etc.
There are several LLM benchmarks for web agents, but agents are not the only web application of LLMs. What about more fine-grained web-page understanding? Our new benchmark VisualWebBench evaluates LLMs on abilities such as OCR, QA, identifying DOM elements, etc.
Check out our new method for evaluating the quality of generated images, VQAScore! It's simple, runs locally, and is relatively good at evaluation.
Check out our new method for evaluating the quality of generated images, VQAScore! It's simple, runs locally, and is relatively good at evaluation.
Multitask learning (MTL) is known to enhance model performance on average, yet its effect on group fairness is under-explored. In our recent #TMLR2024 paper with @derylucio @setlur_amrith @AdtRaghunathan @atalwalkar & @gneubig, we address this gap! openreview.net/forum?id=sPlhA… (1/10)
Checkout our work on adapting multitask learning as a tool against worst case group error. Our modified MTL approach (main task + pre-training auxiliary objective + L1 embedding reg) is competitive against bespoke DRO (Distributionally Robust Optimization) methods
Ever noticed how Pixar adapts movies for international markets? The beloved newscaster in Zootopia is a jaguar in Brazil, a panda in China, a koala in Australia … While machine translation (MT) has only dealt with language in speech/text thus far, we extend the scope of MT to…
Thanks to Devin for the contribution to OpenDevin! It's great to see that even AI programmers believe in the power of open source 😃 github.com/OpenDevin/Open…
Attention #NAACL members! We are surveying the community about the name for NAACL, where the "NA" currently stands for North America. Share your thoughts here 👉 forms.gle/r8SWiu8goG79kw…
SWE-agent is our new system for autonomously solving issues in GitHub repos. It gets similar accuracy to Devin on SWE-bench, takes 93 seconds on avg + it's open source! We designed a new agent-computer interface to make it easy for GPT-4 to edit+run code github.com/princeton-nlp/…
Graham Neubig - Can we make building with open-source AI as simple as prompting ChatGPT? (@gneubig ) youtube.com/watch?v=BiklOj…
Apparently the original transformer figure was drawn in illustrator, but I have a modifiable version in keynote here in case it's useful to anyone: phontron.com/class/anlp2024…
Apparently the original transformer figure was drawn in illustrator, but I have a modifiable version in keynote here in case it's useful to anyone: phontron.com/class/anlp2024…
OpenDevin hits 10K Star. ⭐ Thanks to the community guys for their efforts! ❤️ github.com/OpenDevin/Open…
Open Devin: Create any Application with Open Source Devin 🔗 Integrating @ollama & @GroqInc 🔍 How to Install & Setup? 📖 Step by Step Guide 🚀 Free & Open-Source 🔧 Real-Time Debugging Subscribe: youtube.com/@MervinPraison YT: youtube.com/watch?v=3-q5Gz… #devin #opendevin…
(((ل()(ل() 'yoav))).. @yoavgo
46K Followers 2K FollowingSasha Rush @srush_nlp
51K Followers 463 Following Professor, Programmer in NYC. Cornell Tech, Hugging Face 🤗 https://t.co/cZl0wTfqGzPercy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistKyunghyun Cho @kchonyc
60K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).Sam Bowman @sleepinyourhat
35K Followers 3K Following AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.Yi Tay @YiTayML
28K Followers 97 Following Chief scientist & Co-founder @RekaAILabs past: Research Scientist @Google Brain 🧠 currently learning to be a dad 🍼👶Akari Asai @AkariAsai
11K Followers 650 Following Ph.D. student @uwcse & @uwnlp. NLP. IBM Ph.D. fellow (2022-2023). Meta student researcher (2023-) . ☕️ 🐕 🏃♀️🧗♀️🍳Yoav Artzi @yoavartzi
13K Followers 163 Following Research/prof @cs_cornell + @cornell_tech🚡 / https://t.co/9YnWry7yHs / https://t.co/3VmRSyYm2d / asso. faculty director @arxiv / building https://t.co/f9QkzO5kaCDanish Pruthi @danish037
6K Followers 627 Following Faculty at Indian Institute of Science, Bangalore. PhD from @LTIatCMU.Kayo Yin @kayo_yin
8K Followers 554 Following PhD student @berkeley_ai @berkeleynlp working on interpretability and signed languages. Former @msftresearch @deepmind @carnegiemellon @polytechnique. 🇫🇷🇯🇵Jacob Andreas @jacobandreas
13K Followers 955 Following Teaching computers to read. Assoc. prof @MITEECS / @MIT_CSAIL (he/him). https://t.co/5kCnXHjtlY https://t.co/2A3qF5vdJwTal Linzen @tallinzen
16K Followers 893 Following Professor @nyuling and @NYUDataScience, research scientist @GoogleAIZachary Lipton @zacharylipton
59K Followers 2K Following Professor: CMU/@acmi_lab, CTO / CSO: @AbridgeHQ, Creator: @d2l_ai & https://t.co/QQt98VNLUp, Relapsing 🎷Jay Alammar @JayAlammar
35K Followers 1K Following Machine learning and language models R&D. Builder. Writer. Visualizing AI, ML, and LLMs one concept at a time. @Cohere. https://t.co/TquuQXlLOJThomas Wolf @Thom_Wolf
67K Followers 4K Following Co-founder and CSO @HuggingFace - open-source and open-scienceNaomi Saphra @nsaphra
7K Followers 1K Following Waiting on a robot body. ML/NLP. All opinions are universal and held by both employers and family. Same username on every lifeboat off this sinking ship.Tim Dettmers @Tim_Dettmers
28K Followers 819 Following PhD Student at @UW. I blog about deep learning and PhD life at https://t.co/Y78KDJJFE7.Ming Tan @MingTan83344874
7 Followers 77 FollowingMushin @Mushin_J
86 Followers 197 Followingupteronext @upteronext
50 Followers 163 FollowingYuichi Sasaki @ Spira.. @moreinteraction
2K Followers 2K Following Ph.D. in Physics. 東大素粒子実験 at CERN → Business Consultant → Deep Learning Researcher → Neural Pocket CTO → Spiral AI CEO。モミアゲの人。Art Intelligence @Art_Intelligo
105 Followers 559 Following ART INTELLIGENCE - A reasoned window on the new artificial art. To find the meaning beyond the technical. #AIArt #AIArtwork #AIArtistCommunity #ThoughtfulAIArtXiwen Wei @XiwenWei_
7 Followers 56 FollowingEvangeline @Evangeljy
1 Followers 87 FollowingAlen Capalik @capalik
182 Followers 862 Following Founder of CounterTack (now GoSecure) & https://t.co/snWLnZolVI, Entrepreneur, Hacker, Computer Programmer, AI/ML, GPUs, Cybersecurity, Investing, Long Time Options TraderEdmar Miyake @emiyake
38 Followers 421 Following김성찬 @gimseon0727608
10 Followers 110 FollowingAutapse777 @autapse777
79 Followers 1K Following ~8e-9 of humanity. fr/en-CA. Computers. Maths. Music. Atheist. Ternary logic will save the world. Mostly unaware. My name is Nicolas.Gmail Accounts 🇺�.. @accounts_g1158
52 Followers 436 Following #Bitcoin #USDT #Ethereum #Payoneer #Direct_Bank_Transfer #PayPalZiku @ZikuD_s
351 Followers 1K Following Luck does exist, it exists as each of us make it happen. INFJ-T. 동극대. ML Engineer and Security ResearcherEpsilon🔭 @american_zero1
620 Followers 3K Following 🚬👻⚡️🪶TK-ZD-27 📐🧬Ax = λx (🔗) #ASTRAL #Ai Ei EA📡👁️👾 |ψ⟩ =∫dx ψ(x)|x⟩ just your average #SIGINT machine elf surfing the quantum foam 🕉️⚛️ #hyperspacebanu @banudk
26 Followers 151 FollowingToqi Tahamid @toqitahamid
1K Followers 520 FollowingRobert Brennan @rbren_dev
104 Followers 178 Following #OpenSource and #Kubernetes at @FairwindsOps https://t.co/nDmQwR9IP4Tianjun Zhang @tianjun_zhang
1K Followers 747 Following Project Lead of RAFT, Gorilla, and member of LiveCodeBench, PhD student at Berkeley-AI-Researchg^X @algorithms77
95 Followers 3K Following Researcher studying intelligence both artificial and biological. Seeking to understand intelligence and how we may enhance itSofia Mancini @SofiaManci1998
0 Followers 18 Followingタカケン@Circular.. @TakaKen_TypeR
1K Followers 4K Following 新規事業開発に取り組んでいる方、起業家の方と繋がりたい/生産技術→研究開発→新規事業開発/サーキュラーデザイン/A-CSM®︎/愛車はCivic TypeRとCBR600RR/ハードウェア開発・サービスデザインにスクラムを/SONY 製品好き/発言は全て個人の見解です。Yichen (Zach) Wang �.. @YichenZW
101 Followers 172 Following Senior Undergrad Interning @UWNLP @Tsvetshop & @BerkeleyNLP | Honored CS BS @XJTU1896 24’Kaiming Liu @kmingl20
0 Followers 23 FollowingJim B @jamesberkery1
359 Followers 642 Following Big fan of X…citizen journalism is amazing. Also being able to say what’s on your mind and hear what is on others minds is addictive. maybe therapeutic.David 🇪🇺 @DavidAntill4
916 Followers 3K FollowingYiyan Zhai @Yiyan_Zhai
0 Followers 1 FollowingSergio @sergiosk8_713
156 Followers 2K FollowingHa Me @HaMe645443
142 Followers 1K Following What we do with what we have is more important than what we have! Learn to live with 10% of your income: The richest man in Babylon! Humans're evil&selfish?Nilay Pochhi @pochhi_nilay
81 Followers 959 Followingcharan @HeySCN
17 Followers 5K FollowingSimon @SimonYouDao
39 Followers 665 Following People may not remember exactly what you did,or what you said,but they will always remember how you made them feel.#RapidReplay @RapidReplays
10K Followers 2K Following LATEST RAPID REPLAYS It's Like Having The Jumbotron In The Palm Of Your Hands!Vairam Kittayya @Vairam1Krishna
301 Followers 1K Following 5 minutes na timeline chustey nake irritation ostadhi.. nuvvem chustav leMatthias Longin @MatthiasL94672
70 Followers 392 Following Ich wurde am 3.4.1991 nach Christus geboren, wohne in der Kremmlerstraße 41 70597 StuttgartNguyen Kaitlyn @NguyenKaitlyn4
0 Followers 84 Following(((ل()(ل() 'yoav))).. @yoavgo
46K Followers 2K FollowingSasha Rush @srush_nlp
51K Followers 463 Following Professor, Programmer in NYC. Cornell Tech, Hugging Face 🤗 https://t.co/cZl0wTfqGzPercy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistKyunghyun Cho @kchonyc
60K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).Christopher Manning @chrmanning
126K Followers 114 Following Director, @StanfordAILab. Assoc. Director, @StanfordHAI. Founder, @stanfordnlp. Prof. CS & Linguistics, @Stanford. IP @aixventureshq. 🇦🇺 Do #NLProc & #AI. 👋Yi Tay @YiTayML
28K Followers 97 Following Chief scientist & Co-founder @RekaAILabs past: Research Scientist @Google Brain 🧠 currently learning to be a dad 🍼👶Akari Asai @AkariAsai
11K Followers 650 Following Ph.D. student @uwcse & @uwnlp. NLP. IBM Ph.D. fellow (2022-2023). Meta student researcher (2023-) . ☕️ 🐕 🏃♀️🧗♀️🍳Yoav Artzi @yoavartzi
13K Followers 163 Following Research/prof @cs_cornell + @cornell_tech🚡 / https://t.co/9YnWry7yHs / https://t.co/3VmRSyYm2d / asso. faculty director @arxiv / building https://t.co/f9QkzO5kaCDanish Pruthi @danish037
6K Followers 627 Following Faculty at Indian Institute of Science, Bangalore. PhD from @LTIatCMU.Kayo Yin @kayo_yin
8K Followers 554 Following PhD student @berkeley_ai @berkeleynlp working on interpretability and signed languages. Former @msftresearch @deepmind @carnegiemellon @polytechnique. 🇫🇷🇯🇵Jacob Andreas @jacobandreas
13K Followers 955 Following Teaching computers to read. Assoc. prof @MITEECS / @MIT_CSAIL (he/him). https://t.co/5kCnXHjtlY https://t.co/2A3qF5vdJwTal Linzen @tallinzen
16K Followers 893 Following Professor @nyuling and @NYUDataScience, research scientist @GoogleAIZachary Lipton @zacharylipton
59K Followers 2K Following Professor: CMU/@acmi_lab, CTO / CSO: @AbridgeHQ, Creator: @d2l_ai & https://t.co/QQt98VNLUp, Relapsing 🎷Thomas Wolf @Thom_Wolf
67K Followers 4K Following Co-founder and CSO @HuggingFace - open-source and open-scienceTim Dettmers @Tim_Dettmers
28K Followers 819 Following PhD Student at @UW. I blog about deep learning and PhD life at https://t.co/Y78KDJJFE7.Shruti Rijhwani @shrutirij
4K Followers 497 Following * Research Scientist @GoogleDeepMind * #NLProc research * PhD from @LTIatCMU * Amateur woodworker, scuba diver, foosball playerShaily @shaily99
5K Followers 2K Following PhD @LTIatCMU Prev: @GoogleAI @MSFTResearch. Working on #NLProc evaluation, fairness & culture. Usually ranting, often about research & DEI. 📚 @readsndrantsColin Raffel @colinraffel
30K Followers 655 Following nonbayesian parameterics, sweet lessons, and random birds. Friend of @srush_nlpRobert Brennan @rbren_dev
104 Followers 178 Following #OpenSource and #Kubernetes at @FairwindsOps https://t.co/nDmQwR9IP4Marco Mascorro @Mascobot
10K Followers 2K Following Partner @a16z | Cofounder @Fellow_AI | AI & Robotics Engineer | prev @BMW research | @MIT 35 under 35 | Opinions my own.Aviral Kumar @aviral_kumar2
2K Followers 338 Following Research Scientist at Google DeepMind. Incoming Assistant Professor of CS & ML at CMU (Fall 2024). PhD from UC Berkeley.Zhiqiu Lin @ZhiqiuLin
96 Followers 88 Following PhD Student at Carnegie Mellon University | Computer Vision and Language | Generative AICharles 🎉 Frye @charles_irl
9K Followers 2K Following ai engineer at @modal_labs. he/him. ex @full_stack_dl, @weights_biases, phd Berkeley @Redwood_Neuro.Niklas Muennighoff @Muennighoff
5K Followers 319 Following @ContextualAI | Interests: AI/LLM Research & Health ❤️ | Past: @huggingface @PKU1898Katherine Lee @katherine1ee
6K Followers 930 Following understanding ourselves and our models. senior research scientist @GoogleBrain, @genlawcenter and @CornellCIS, formerly @Princeton @[email protected]Lintang Sutawika @lintangsutawika
381 Followers 562 Following Incoming Ph.D. student @LTIatCMU. Researcher at @AIEleuther. Maintainer of LM-Eval Harness. Here for machine learning papers and discussion.John Yang @jyangballin
2K Followers 438 Following CS/NLP MS student @princeton_nlp Previously @Berkeley_EECSAfreen Shaikh @afreen19979
1 Followers 5 FollowingKaren Hao @_KarenHao
61K Followers 1K Following ai reporter. national magazine award winner. contributing writer @theatlantic. formerly @wsj @techreview @KSJatMIT @TAPP_Project. email: [email protected]Binyuan Hui @huybery
5K Followers 310 Following 🤔 Core maintainer at Qwen Team and OpenDevin. || Code Generation, Text-to-SQL, Large Language Models.Haofei Yu @haofeiyu44
155 Followers 722 Following MS student @LTIatCMU | previously CS undergrad @ZJU_China | ex-intern @Apple @TencentGlobalRuiyi Wang @RuiyiWang153
120 Followers 171 Following Incoming PhD @ucsd_cse | MS @LTIatCMU | BS @UMichCSE and @sjtu1896 | NLP & HCI researchJunyang Lin @JustinLin610
4K Followers 1K Following Chief Evangelist Officer of Qwen Team & OpenDevin, building LLM and LMM. Now @Alibaba_Qwen . Previously @PKU1898 LANCO group. ❤️ 🍵 ☕️ 🍷 🥃rohan anil @_arohan_
12K Followers 2K Following Principal Engineer, @GoogleDeepMind Gemini. prev PaLM-2. Tinkering with optimization and distributed systems. opinions are my own.sarah guo // convicti.. @saranormous
91K Followers 3K Following startup investor and builder, founder @w_conviction. accelerating AI adoption, interested in progress. tech podcast: @nopriorspodStella Biderman @BlancheMinerva
14K Followers 749 Following Open source LLMs and interpretability research at @BoozAllen and @AiEleuther. My employers disown my tweets. She/herKarina Nguyen @karinanguyen_
12K Followers 646 Following AI research & eng @AnthropicAI, prev. intern @nytimes, @square, @dropboxHailey Schoelkopf @haileysch__
3K Followers 803 Following she/her | research scientist @aiEleuther | LLM training/infra, eval, data | LM Evaluation Harness maintainerKatherine Tian @kattian_
709 Followers 479 Following cs/stat @harvard, working on calibration & factuality of LLMs, prev @GoogleAI tensorflow, golden state @warriors fanSireesh Gururaja @_sireesh
377 Followers 2K Following Trying to get to know my neighbors, both irl and online. PhD student @LTIatCMU, interested in NLP that lets people keep agency. Former: @kensho, @IBM, @ColumbiaSeungone Kim @seungonekim
928 Followers 833 Following Incoming Ph.D. student @LTIatCMU, M.S. student @kaist_ai working on LLM Evaluation & Systems that Improve with (Human) Feedback | Prev: @yonsei_u @NAVER_AI_LabConference on Languag.. @COLM_conf
1K Followers 6 Following https://t.co/GhGCMEoa4A Abstract submission: March 22, 2024Xiang Yue @xiangyue96
2K Followers 421 Following Postdoc @LTIatCMU. PhD from Ohio State @osunlp. Training & evaluating foundation models. Pushing the boundaries of AI🤖. Previously @MSFTResearch.Iryna Gurevych @IGurevych
805 Followers 42 Following #NLProc professor @CS_TUDarmstadt @TUDarmstadt @mbzuai | Co-Founder @hessian_AI | @ELLISforEurope | @ATHENECenter | @emergen_CITY | Member of the @bbaw_deYu Su @ysu_nlp
6K Followers 857 Following Dist. Assist. Prof.@OhioState, Director @osunlp, 20% Researcher@Microsoft. I like to think about intelligence, artificial or biologicalWill Kurt @willkurt
7K Followers 781 Following Working on open generative AI round the clock! ☀️ making LLMs incredible with @dottxtai. 🌙 Writing "A Damn Fine Stable Diffusion Book" https://t.co/qYqCHZSuR1Teknium (e/λ) @Teknium1
28K Followers 3K Following Cofounder @NousResearch, prev @StabilityAI Github: https://t.co/LZwHTUFwPq HuggingFace: https://t.co/sN2FFU8PVE Support me on Github SponsorsMistral AI @MistralAI
89K Followers 0 Following Fast, open-source and secure language models. Join us https://t.co/INALdNGvCPZichun Yu @yu_zichun52802
25 Followers 21 Following Ph.D. student at the Language Technologies Institute, Carnegie Mellon UniversitySyeda Nahida Akter @SNAT02792153
152 Followers 476 Following PhD student at @LTIatCMU @SCSatCMU. Working on Multimodal Question Answering #NLProcDan Hendrycks @DanHendrycks
17K Followers 79 Following • Director of the Center for AI Safety (https://t.co/ahs3LYCpqv) • GELU/ImageNet-C/MMLU/safety groundwork • PhD in AI from UC Berkeley https://t.co/rgXHAnYAsQ https://t.co/YtGtDh1aAVHao Zhang @haozhangml
3K Followers 253 Following Asst. Prof. @HDSIUCSD and @ucsd_cse running @haoailab. Cofounder and runs @lmsysorg.Meredith Ringel Morri.. @merrierm
10K Followers 659 Following Director of Human-AI Interaction Research @GoogleDeepMind. @UW Affiliate Prof. #HCI & human-centered #AI; @sigchi Academy; ACM Fellow. Opinions my own.Antje Barth @anbarth
4K Followers 776 Following Principal Developer Advocate GenAI 👩🏻💻 @awscloud ☁️ O’Reilly author. Conference speaker. Travel and beach addict. ✈️🌎☀️🏝 Tweets and opinions are my own.Shunyu Yao @ShunyuYao12
7K Followers 834 Following Language agents (ReAct, Reflexion, Tree of Thoughts) for digital automation (WebShop, SWE-bench, SWE-agent)Ari Holtzman @universeinanegg
3K Followers 2K Following PI @UChicagoCS & @DSI_UChicago, leader of Conceptualization Lab https://t.co/BVCT3zdaNV, Post-doc @Meta. We don’t really know much about language models...yet.Incredible Result for the University of Buenos Aires!!! They definitely deserve to get awarded a bronze medal per the ICPC World Final Rules (just 15 minutes of penalty more than the team placed #12). It would also be the first medal for a Latin American team in over 20 years!
47 Contest - Latin America Champion #icpcwfluxor Universidad de Buenos Aires - FCEN (47096)
regardless of realness, nothing is more unreal and unruly than tokens and tokenizers. they have given me so much pain
Dear #NAACL Members, to better explain some of the arguments for a possible name change, several members of our community who reside or originate from the Americas outside of US/Canada have written an open letter 👉 naacl.org/posts/2024-04-… Original survey forms.gle/r8SWiu8goG79kw…
After receiving community feedback, we added @GoogleDeepMind Gemini 1.5 Pro's results. 👇 Gemini 1.5 Pro's vision ability was significantly improved compared to 1.0 Pro and matched GPT-4's performance on our VisualWebBench! 🏆 Its action prediction (e.g., predicting what would…
🚀Introducing VisualWebBench: A Comprehensive Benchmark for Multimodal Web Page Understanding and Grounding. visualwebbench.github.io 🤔What's this all about? Why this benchmark? > Back in Nov 2023, when we released MMMU (mmmu-benchmark.github.io), a comprehensive multimodal…
@PyTorch Congrats PyTorch team on the launch but I think we could already finetune LLMs pretty easily with libraries like LitGPT, and Axolotl. Genuinely asking what makes TorchTune different from the existing libraries?
🔥 Do you want an open and versatile code assistant? Today, we are delighted to introduce CodeQwen1.5-7B and CodeQwen1.5-7B-Chat, are specialized codeLLMs built upon the Qwen1.5 language model! 🔋 CodeQwen1.5 has been pretrained with 3T tokens of code-related data and exhibits…
How to enjoy the best of both worlds of efficient training (less communication and computation) and inference (constant KV-cache)? We introduce a new efficient architecture for long-context modeling – Megalodon that supports unlimited context length. In a controlled head-to-head…
Dataset remains an important yet poorly understood part of language model development. The fact that s simple change (dataset and tokenizer) results in substantial improvements means there are important needles in a humongous haystack of data that should be understood better.
🚀 Introducing Pile-T5! 🔗 We (EleutherAI) are thrilled to open-source our latest T5 model trained on 2T tokens from the Pile using the Llama tokenizer. ✨ Featuring intermediate checkpoints and a significant boost in benchmark performance. Work done by @lintangsutawika, me…
🚀 Introducing Pile-T5! 🔗 We (EleutherAI) are thrilled to open-source our latest T5 model trained on 2T tokens from the Pile using the Llama tokenizer. ✨ Featuring intermediate checkpoints and a significant boost in benchmark performance. Work done by @lintangsutawika, me…
@gneubig +1. Nice benchmark. It will be interesting to investigate how come the numbers are still low then? Memorization issue?
@gneubig Ya I agree! One thing I really like about SWE-bench is the ability to (at least in principle) download new versions of the benchmark to avoid contamination Am still unsure how difficult this is to do in practice, but would be cool if anyone uses this for future model comparisons
@gneubig Dear Professor Neubig, I just want to say a big thank you for your fantastic NLP course on YouTube! Your teaching style is clear and systematic, making complex concepts easy to grasp. Your course has been incredibly helpful for my research in NLP. Thanks a ton for sharing!
this is just proof that agi is achieved, we can now simulate a real software engineer perfectly
Interesting watch. In an official Devin demo, Devin spent six hours writing buggy code and fixing its buggy code when it could have just ran the two commands in the repo's README.
SWE-bench is probably contaminated for frontier models (gpt-4/claude-3-opus). Given only the name of a pull request in the dataset, Claude-3-opus already knows the correct function to modify.
These folks @taoyds @TianbaoX etc. are serious when it comes to agent benchmarks. Excited to have an agent benchmark with an OS simulator to play with!
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments The first-of-its-kind scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across various operating…
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments The first-of-its-kind scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across various operating…
1/ 🥁Scaling Laws for Data Filtering 🥁 TLDR: Data Curation *cannot* be compute agnostic! In our #CVPR2024 paper, we develop the first scaling laws for heterogeneous & limited web data. w/@goyalsachin007 @zacharylipton @AdtRaghunathan @zicokolter 📝:arxiv.org/abs/2404.07177