Aman Madaan @aman_madaan
@xai, PhD Candidate @LTIatCMU madaan.github.io Pittsburgh Joined February 2010-
Tweets402
-
Followers1K
-
Following481
-
Likes1K
Last week, I described four design patterns for AI agentic workflows that I believe will drive significant progress this year: Reflection, Tool use, Planning and Multi-agent collaboration. Instead of having an LLM generate its final output directly, an agentic workflow prompts…
[p1] 🐕Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward🐕 Paper link: arxiv.org/pdf/2404.01258… page: github.com/RifleZhang/LLa… How to effectively train video large multimodal Model (LMM) alignment with preference modeling?
Our latest reasoning update. 24%->50% on MATH from Grok 1 to 1.5.
based and 🔓 wanna help accelerate the next Grok? looking for builders: — Rust/Jax/Kube infra engineers — front-end/full-stack engineers x.ai/careers
based and 🔓 wanna help accelerate the next Grok? looking for builders: — Rust/Jax/Kube infra engineers — front-end/full-stack engineers x.ai/careers
Alright people check it out
Really nice work! Part of it is also quite simple to implement in just a few (<100) lines with torch/hf. Here is a notebook that implements and runs algorithm 1 in the paper, and correctly guesses 4096 as one of the candidates for `h` for `mistralai/Mistral-7B-v0.1`. Works…
Really nice work! Part of it is also quite simple to implement in just a few (<100) lines with torch/hf. Here is a notebook that implements and runs algorithm 1 in the paper, and correctly guesses 4096 as one of the candidates for `h` for `mistralai/Mistral-7B-v0.1`. Works… https://t.co/5AnpnS5NnR
🚀Introducing Nemotron-4 15B by @nvidia! 🎉 With 15B parameters and trained on 8T tokens, it's impressive in multilingual AI. Outperforms all similarly-sized models and dominates in multilingual tasks, even surpassing models 4x larger! #NVIDIA #Nemotron4 arxiv.org/pdf/2402.16819…
[1/n] 🚀 Excited to share our latest work on OpenCodeInterpreter! With a blend of execution results and human feedback, we've achieved significant advancements in code generation. Here are the key points: ✨ Introducing OpenCodeInterpreter - a leap in iterative code refinement.…
Frontier models all have at least 100k context length, Gemini 1.5 has even 1m context. What about research and open source? Introducing Long Context Data Engineering, a data driven method achieving the first 128k context open source model matching GPT4-level Needle in a…
For many tasks, there is usually one correct answer, and *many* ways to be wrong. But mistakes can be informative, too! LEAP uses this idea to automatically draft a few "principles" for every task (e.g., two `not` operations cancel out in boolean algebra). These principles are…
For many tasks, there is usually one correct answer, and *many* ways to be wrong. But mistakes can be informative, too! LEAP uses this idea to automatically draft a few "principles" for every task (e.g., two `not` operations cancel out in boolean algebra). These principles are…
In-Context Principle Learning can potentially transform instruction-tuning 🔥. Here's how: 🧠 Long-form instructions are back! Instructions in its original form were longer and represented valuable task-specific knowledge, that's how they were different from prompts. For…
In-Context Principle Learning can potentially transform instruction-tuning 🔥. Here's how: 🧠 Long-form instructions are back! Instructions in its original form were longer and represented valuable task-specific knowledge, that's how they were different from prompts. For… https://t.co/kmFQCycKo3
📢New paper : "In-Context Principle Learning from Mistakes" Instead of prompting using only *correct* few-shot examples, we intentionally make *mistakes*, and then learn "principles" or "lessons" from them. Lead by @tianjun_zhang @aman_madaan @luyu_gao arxiv.org/pdf/2402.05403…
📢New paper : "In-Context Principle Learning from Mistakes" Instead of prompting using only *correct* few-shot examples, we intentionally make *mistakes*, and then learn "principles" or "lessons" from them. Lead by @tianjun_zhang @aman_madaan @luyu_gao arxiv.org/pdf/2402.05403… https://t.co/w0nY0KGU6s
In-Context Principle Learning from Mistakes paper page: huggingface.co/papers/2402.05… In-context learning (ICL, also known as few-shot prompting) has been the standard method of adapting LLMs to downstream tasks, by learning from a few input-output examples. Nonetheless, all…
Today I have a huge announcement. The dataset used to create Open Hermes 2.5 and Nous-Hermes 2 is now PUBLIC! Available Here: huggingface.co/datasets/tekni… This dataset was the culmination of all my work on curating, filtering, and generating datasets, with over 1M Examples from…
I used to find writing CUDA code rather terrifying. But then I discovered a couple of tricks that actually make it quite accessible. In this video I introduce CUDA in a way that will be accessible to Python programmers, and I even show how to do it all in @GoogleColab!
Teaching a new course on Neural Code Generation with @dan_fried! cmu-codegen.github.io/s2024/ Here is the lecture on pretraining and scaling laws: cmu-codegen.github.io/s2024/static_f…
Our paper on ✨ Self-Aligning Language Models via RLAIF ✨ has been officially accepted at @iclr_conf 2024! We're thrilled to share our insights in Vienna. Stay tuned for self-aligning advancements in LLMs. #ICLR2024 See you there! 🌍🚀
Our paper on ✨ Self-Aligning Language Models via RLAIF ✨ has been officially accepted at @iclr_conf 2024! We're thrilled to share our insights in Vienna. Stay tuned for self-aligning advancements in LLMs. #ICLR2024 See you there! 🌍🚀
Jim Fan @DrJimFan
229K Followers 3K Following @NVIDIA Sr. Research Manager & Lead of Embodied AI (GEAR Lab). Creating foundation models for Humanoid Robots & Gaming. @Stanford Ph.D. @OpenAI's first intern.Shruti Rijhwani @shrutirij
4K Followers 499 Following * Research Scientist @GoogleDeepMind * #NLProc research * PhD from @LTIatCMU * Amateur woodworker, scuba diver, foosball playerGraham Neubig @gneubig
31K Followers 588 Following Associate professor at CMU, studying natural language processing and machine learning.Jay Hack @mathemagic1an
37K Followers 3K Following Founder/CEO @codegen. Tweets about AI, computing, and their impacts on society. Previously did startups, @palantir, @stanford. Not a pseudonym.Harrison Chase @hwchase17
54K Followers 410 Following @LangChainAI, previously @robusthq @kensho MLOps ∪ Generative AI ∪ sports analyticsYao Fu @Francis_YAO_
14K Followers 2K Following PhD @EdinburghNLP on LLMs and Machine Reasoning. Ex. @Columbia @PKU1898 @MITIBMLab @allen_ai AGI has yet to come, so keep runningSam Whitmore @sjwhitmore
12K Followers 2K Following building @newcomputer. not a cat (or a man) in real life. I like to run a lot! @kensho @harvard @StuyNYSteven Feng @stevenyfeng
1K Followers 275 Following Stanford CS PhD student @stanfordnlp @StanfordAILab. Master's from Carnegie Mellon @LTIatCMU. NLP, Computer Vision, Machine Learning, and AI research.Luyu Gao @luyu_gao
1K Followers 241 Following PhD candidate @CarnegieMellon @LTIatCMU On the job market for full-time industry position.Sebastian Gehrmann @sebgehr
5K Followers 2K Following Head of NLP, CTO office, @Bloomberg. (he/him) Generating natural language, one word at a time. Also making sense of that language afterwards. views my ownGabriel Ilharco @gabriel_ilharco
4K Followers 1K Following Building cool things @xAI. Prev. PhD at UW, Google AIAlex Graveley @alexgraveley
31K Followers 933 Following I’m Alex Graveley, creator of GitHub Copilot, AI Tinkerers, Dropbox Paper, MobileCoin, and Hackpad. Building @ai_minion Hiring https://t.co/nsHar8OLPCAakanksha Chowdhery @achowdhery
7K Followers 3K Following LLMs @ Google DeepMind :: PaLM, Gemini // Previously @MSFTResearch, @Stanford, @Princeton // views my own and subject to changeVivek Gupta @keviv9
2K Followers 5K Following PostDoc @cogcomp UPenn | Ph.D. CS @UUtah | @iitkanpur. @Bloomberg & @MSFTResearch Fellow | ex-@MetaAI, @IBM, @Verisk, @samsungresearch, @Synopsys #nlp #mlTyman Mayo @tymanmayo2
871 Followers 2K Following Doesn't matter. I'm fulfilled and relaxed in Colorado.😆Shashank Sangar @ShashankTesla
16 Followers 204 Following Recruiting at Tesla AI for Core Autonomy (Autopilot & Optimus)PollyWylde @tsj0NsqlRB91B
0 Followers 111 FollowingMichi Yasunaga @michiyasunaga
3K Followers 869 Following CS PhD @Stanford working on language models and multimodal models. Previously @Meta @GoogleDeepMind @YaleCheng-Kuang Wu @brianckwu
4 Followers 37 FollowingElmaSenior @Ty1t20vYhl0ReD
0 Followers 181 FollowingSir Mo van da Weed �.. @can420nabis
421 Followers 1K Following 🍁 Wissenschaft ist der neueste Stand bewiesener Irrtümer! 🕴️Autodidakt ⚕️Cannabispatient & -Sommelier ✨ 𝕏Ɖ 🧬 #teamscience 🔬Do Only Good Everyday 🐕super intelligence @eacc72
12 Followers 688 Following GPT6 is a Level 2 AGI and will be released in 2025Virgil Meridith @VirgMerid
71 Followers 5K FollowingAndrew Thompson @AndrewT65390500
312 Followers 374 Following Christian Conservative 🍊#1a + #2a = God-given non-negotiable rights to reject totalitarianism and tyranny.Abhinav Gupta @backpropper
793 Followers 5K Following phd student @Mila_Quebec | ms @CILVRatNYU @NYU_Courant | previously @GoogleDeepMind @AIatMeta @GoogleAI @labsdotgoogle @MSFTResearch @AdobeResearchSahil Antil @oxshitantil
16 Followers 804 Following Founder @kavachbuilders @foodkavach @arqaifashionMAB氏 @MAB1791652
1 Followers 36 FollowingAds ads @Adsads252800
0 Followers 16 FollowingWeloop @Weloop_official
17 Followers 72 Following Download “Weloop” to be a part of your friends circlenik t. hatziefstathio.. @nikthehat
40K Followers 4K Following ⌗ Innovator-in-Chief ⇢ ❍ne World ✍︎ Investigative Journalist & Director of Open Records Strategy ⇢ AtNight Media ⌇ The New Way ® | One World 🌍Omair Shahid @OmairShahid
382 Followers 959 Following Product of progressive public policy; raised by public libraries and public education that produced a passion for politics. and apparently alliterationBen A. Goldberg ™ �.. @BenAnaven
953 Followers 1K Following YESHUA Ha'Mashiach (LORD Jesus Christ) is The Creator and The King of the Universe! - For Elon Musk: I have monetization idea for X. Game changer! -Sahil Antil @oxshitantil1
43 Followers 642 FollowingINGABO @lingaboh
53 Followers 109 FollowingThomas Lancer @LancerThomas
441 Followers 1K Following Building self-learning, multi-modal conversational AI w/ a lean team of A-players (exploring millions of hrs of call data + self-play + game theory principles)paul @wanggnoy
34 Followers 1K FollowingDana Mahmood @deordered
24 Followers 731 Following Fine-tuning AI models oftentimes & practicing philosopher at other times.Jannifer chigbu @riva_edgew11272
30 Followers 809 Following ELITE Business coach 1st female Fx trader & Educator 7 figure forex trader & mentor (mindset) peak parformance coachMars (parody) @marknadal
6K Followers 369 Following runway model, cybersecurity CEO, dad, Supreme Court paralegal, escort, CIA consultant, TV director, landlord, poet, HERETIC. engineered products used by 1% 🌎.Charles @Charlie10tang
38 Followers 72 FollowingAMSARAJ N @amsaraj_n
439 Followers 2K FollowingTerry Yue Zhuo @terryyuezhuo
215 Followers 663 Following No HumanEval. We have a better answer @BigCodeProject @sgSMU @seaAIL @Data61news @Monashinfotechcoffee & AI @realcoffeeAI
52 Followers 740 Following Sitting on a park bench scattering random seeds for the LLMs. I never bet against Elon.Claire Korea @theclairekorea
82 Followers 123 Following making friends @Character_AI | prev Data Engine @Tesla_AI | opinions are my ownAditi @aditigaur_
106 Followers 421 FollowingSpiderman 🇮🇳 @returnspiderman
1K Followers 6K Following Seek the truth | Everybody talks, very few listen | Watch out here comes the Spider-Man 😁 https://t.co/qwmEhH45SYWill Mac @ca_dryclean
6 Followers 122 FollowingCryptocracyyy @cryptocracyyy
87 Followers 307 FollowingKodom John @kodomm__
8 Followers 70 FollowingPablo Ubilla @pablo_ubilla7
724 Followers 4K Following I will tell you enough to keep you intrigued... but you shall never truly know meJohn Basham @JohnBasham
80K Followers 13K Following @FBI Target #TwitterFiles For Censorship, Meteorologist, AI, Data Scientist, @USArmy Ret, #IC, Fmr TX Elected Official. Seen @AmThoughtLeader Heard @SeanHannitySOT @SoloOrTroll
10K Followers 2K Following 22 | smite pro | twitch streamer | i love movies, tesla, robots, and technology 🦾🤖Keiran Paster @keirp1
1K Followers 638 Following Currently PhD at the University of Toronto. Fall 2023 student researcher at Google. Training sequence models. Recent: APE, STEVE-1, OpenWebMath, Llemma.Andrej Karpathy @karpathy
980K Followers 905 Following 🧑🍳. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets 🧠🤖💥Danish Pruthi @danish037
7K Followers 628 Following Faculty at Indian Institute of Science, Bangalore. PhD from @LTIatCMU.Yann LeCun @ylecun
712K Followers 719 Following Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.Jim Fan @DrJimFan
229K Followers 3K Following @NVIDIA Sr. Research Manager & Lead of Embodied AI (GEAR Lab). Creating foundation models for Humanoid Robots & Gaming. @Stanford Ph.D. @OpenAI's first intern.Shruti Rijhwani @shrutirij
4K Followers 499 Following * Research Scientist @GoogleDeepMind * #NLProc research * PhD from @LTIatCMU * Amateur woodworker, scuba diver, foosball playerGraham Neubig @gneubig
31K Followers 588 Following Associate professor at CMU, studying natural language processing and machine learning.AK @_akhaliq
310K Followers 3K Following AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo follow on Hugging Face: https://t.co/q2Qoey80GxJay Hack @mathemagic1an
37K Followers 3K Following Founder/CEO @codegen. Tweets about AI, computing, and their impacts on society. Previously did startups, @palantir, @stanford. Not a pseudonym.Harrison Chase @hwchase17
54K Followers 410 Following @LangChainAI, previously @robusthq @kensho MLOps ∪ Generative AI ∪ sports analyticsDivyansh Kaushik @dkaushik96
4K Followers 3K Following Emerging tech and national security. DC/PGH. “An imported Indian immigrant,” @BreitbartNews.Yao Fu @Francis_YAO_
14K Followers 2K Following PhD @EdinburghNLP on LLMs and Machine Reasoning. Ex. @Columbia @PKU1898 @MITIBMLab @allen_ai AGI has yet to come, so keep runningYonatan Bisk @ybisk
3K Followers 883 Following Assistant Professor (he/him) @LTIatCMU/@SCSatCMU - embodied #NLProc Stealing ideas from @_Hao_Zhu @viddivj @_Yingshan @SoYeonTiffMin, @FernJared @abitha___Sam Whitmore @sjwhitmore
12K Followers 2K Following building @newcomputer. not a cat (or a man) in real life. I like to run a lot! @kensho @harvard @StuyNYGreg Durrett @gregd_nlp
6K Followers 752 Following CS professor at UT Austin. I do NLP most of the time. he/himHieu Pham @hyhieu226
2K Followers 41 Following Making GPUs go brrrr @augmentcode 🤖 Past: Research Scientist at Google Brain 🧠 IMO Silver Medalist 🥈 waiting for LLMs to beat me. Tweets are my own opinions.Zaid Sheikh @zdshkh11
31 Followers 103 Following Senior Research Programmer at Carnegie Mellon UniversityMars (parody) @marknadal
6K Followers 369 Following runway model, cybersecurity CEO, dad, Supreme Court paralegal, escort, CIA consultant, TV director, landlord, poet, HERETIC. engineered products used by 1% 🌎.Keiran Paster @keirp1
1K Followers 638 Following Currently PhD at the University of Toronto. Fall 2023 student researcher at Google. Training sequence models. Recent: APE, STEVE-1, OpenWebMath, Llemma.Shunyu Yao @ShunyuYao12
7K Followers 858 Following Language agents (ReAct, Reflexion, Tree of Thoughts) for digital automation (WebShop, SWE-bench, SWE-agent)Michael Saxon @m2saxon
2K Followers 1K Following CS PhD cand @ucsbNLP 🌊🌴 @NSF GRFP 🧐analyzing semantics in generative lang/img AI models🤖 Big tech ex-intern. BS/MS @ASU 🌵🏜 🔜 @AMD opensrc GenAI RS internNicolas Wörmann @NWormann
57 Followers 237 Following mathematics and computer science @lmu_muenchen ceo speedscaleShehzaad Dhuliawala @shehzaadzd
343 Followers 909 Following PhD student at @ETH_en | Previously Research Engineer @MSFTResearch Montréal | Master's at @UMassCS. He/HimJoe Fenton @JoeFenton
1K Followers 2K Following AI and investing 🤖📈 PM @MicrosoftAI Prev. Founding Product Manager @InflectionAI and PM @GoogleDeepMind and @GoogleAIAli Behrouz @behrouz_ali
914 Followers 846 Following Ph.D. Student @cornell, interested in machine learning.Lunjun Zhang @ZhangLunjun
366 Followers 537 Following cs phd student @uoft, student researcher @GoogleDeepMind singularity requires singular focusKatia Karpenko @KatiaEarth
811 Followers 546 Following A somewhat-intelligent three-dimensional being at @xAI. Writer: https://t.co/pisunzyEVv. AI Filmmaker. Musician. Upcoming book: https://t.co/rBk0AMk1mFChris Zheng @ChrisZheng001
12K Followers 609 Following Creative content creator I Team player I Love and kindness I CTO :)Jiawei Liu @JiaweiLiu_
2K Followers 957 Following Simplifying the making of great software. PhD Student @plfmse @IllinoisCS.Haotian Liu @imhaotian
6K Followers 397 Following building intelligence @xAI, creator of #LLaVA, cs @UWMadison, prev @MSFTResearchManuel Kroiss @makro_ai
14K Followers 60 FollowingDan Hendrycks @DanHendrycks
17K Followers 81 Following • Director of the Center for AI Safety (https://t.co/ahs3LYCpqv) • GELU/ImageNet-C/MMLU/safety groundwork • PhD in AI from UC Berkeley https://t.co/rgXHAnYAsQ https://t.co/YtGtDh1aAVFabio Aguilera-Conver.. @Faruletes
1K Followers 187 FollowingTing Chen @tingchenai
5K Followers 365 Following Bump up intelligence in all bit streams @xai. Previous @GoogleDeepmind, @GoogleBrain.Xuechen Li @lxuechen
2K Followers 901 Following Building intelligence @xai. PhD @Stanford. Undergrad @UofT. Worked at @GoogleAI @MSFTResearch @Vectorinst. I go by Chen.Rutvik Makwana @rutvikwrites
908 Followers 772 Following AI Tutor @xai • Grokking @grok • Pharmaceutical Science • Cricket, Movies, Voracious Readeromar @therealomaralfy
3K Followers 2K Following Mostly just having conversations with myself 🤷🏽♂️ @XGabriel Ilharco @gabriel_ilharco
4K Followers 1K Following Building cool things @xAI. Prev. PhD at UW, Google AISaeed Maleki @MalekiSaeed
474 Followers 110 FollowingLianmin Zheng @lm_zheng
4K Followers 439 Following CS Ph.D. @ UC Berkeley. Creator of Alpa, Vicuna, and Chatbot Arena. @lmsysorgHarshita Diddee @ihsrahedid
642 Followers 698 Following LTI PhD @SCSatCMU | Prev: RF at @MSFTResearch | Interested in Data Quality Estimationxiao sun @xiaosun86
2K Followers 93 FollowingAditya Paliwal @VastoLorde95
527 Followers 85 Following I only read books that have pictures in themYaroslav Bulatov @yaroslavvb
6K Followers 703 Following ex-Google Brain, OpenAI, Meta Scholar: https://t.co/iVycFw5dSX New Blog: https://t.co/SLix8HqVeY Old Blog: https://t.co/Ur3GWKoOzyYongchao Zhou @Yongchao_Zhou_
537 Followers 301 Following Build Intelligence @xai | ML PhD @UofT @VectorInst | Prev. @GoogleAI @GoogleDeepMind | Working on LLMsRoger Grosse @RogerGrosse
10K Followers 751 FollowingMustafa Suleyman @mustafasuleyman
131K Followers 536 Following CEO, Microsoft AI | Author: The Coming Wave | Past: Co-founder, @InflectionAI & @GoogleDeepMindConnor Leahy @NPCollapse
23K Followers 554 Following Hacker - CEO @ConjectureAI - Ex-Head of @AiEleuther - I don't know how to save the world, but dammit I'm gonna tryBiao Zhang @BZhangGo
621 Followers 279 Following Research Scientist @ Google. Past: PostDoc at UoE. PhD in NLP/MT @edinburghnlp. All opinions are my own.Stas Bekman @StasBekman
7K Followers 268 Following Toolmaker. Software creator, optimizer and harmonizer. Makes things work and fly at @ContextualAI Training LLM/RAG/Generative AI/Machine Learning/ScalabilityPratyush Maini @pratyushmaini
1K Followers 340 Following Trustworthy ML | PhD student @mldcmu | Founding Member @datologyai | Prev. Comp Sc @iitdelhixAI @xai
997K Followers 36 FollowingHave to disagree with this point. I tend to view the needle in haystack as an **entry barrier**: if you cannot pass it, you are not even in the game. To be able to perform complex reasoning over long context, you should able to first be able to retrieve the information at any…
There is no such thing as "long context performance". It just has no meaning. The needle in a haystack thing is almost a complete waste of time. End-to-end evaluation is always the answer.
In the age of large language models, I realized the only sentence I ever talked to Siri is "five minutes timer"
How does self-correction affect problem solving? In a toy transformer model that was trained to solve mazes, I found that performance reliably improved (!) by inserting mistakes and self-corrections into the training data.
One year ago, I left Google Brain (now DeepMind) to join a very early startup. We had fewer than 10 people at that time, and have grown many times since. Today, I am extremely proud to share our milestone. We are Augment. You can read about us here. techcrunch.com/2024/04/24/eri…
We're having a big event on agents at CMU on May 2-3 (one week from now), all are welcome! cmu-agent-workshop.github.io It will feature: * Invited talks from @alsuhr @ysu_nlp @xinyun_chen_ @MaartenSap and @chris_j_paxton * Posters of cutting edge research * Seminars and hackathons
On May 2-3, we're going to have a big event in Pittsburgh about LLM Agents. We have invited talks from great speakers inside and outside CMU, student research presentations and posters, tutorials and discussions! Come join us at CMU campus, and register at cmu-agent-workshop.github.io
PhDone!!!! 👨🎓 08/2019-04/2024 What a journey 🥳🚞 I especially feel lucky to share this once-in-a-life-time moment with people I love ❤️ . And seeing my passion-driven research efforts being acknowledged by researchers I deeply admire 🌞!! Special thanks to my awesome committee…
Turns out that even SOTA MLLMs achieve near random accuracy on these visual IQ questions 🧐
How good are MLLM at solving IQ (abstract visual reasoning) problems? Check our new benchmark paper! MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning Paper: arxiv.org/pdf/2404.13591… Website: marvel770.github.io
@WenhuChen @Teknium1 I think the motivation of LIMA is not to quantify the number of SFT examples that is needed but to highlight (1) how important high quality SFT data is and (2) the superficial alignment hypothesis where pretrained LLM stores all the knowledge and can be easily tuned into an…
The super exciting TED talk on the SixthSense technology by @pranavmistry 14 years back inspired me a lot in many ways over the years 🔥. Finally got a chance to meet him and discuss research 😍. The TED talk video which I have watched a thousand times: youtube.com/watch?v=YrtANP……
Can Language Models Solve Olympiad Programming? - Uses self-reflection and retrieval over episodic knowledge to boost the perf of GPT-4 on USACO from 8.7% pass@1 to 20.2% - Giving a small number of targeted hints solves most of the questions repo: github.com/princeton-nlp/… abs:…
Honored to receive the 2024 Jane Street Graduate Research Fellowship! Thank you @JaneStreetGroup for the award and for organizing an amazing workshop! The best part of this was getting to meet PhD students working on algebraic geometry, cosmology, quantum algorithms, and more!
🚨 New LLM Reasoning Paper 🚨 Q. How can LLMs self-improve their reasoning ability? ⇒ Introducing Self-Explore⛰️🧭, a training method specifically designed to help LLMs avoid reasoning pits by learning from their own outputs! [1/N]
We just released Mixtral-8x22B-v0.1 and Mixtral-8x22B-Instruct-v0.1: - Free to use under Apache 2.0 license - Outperforms all open models - Native function calling - Masters English, French, Italian, German and Spanish. - Seq_len = 64K mistral.ai/news/mixtral-8…
Benchmarks are useful while they still provide signal. Even though every SOTA model has seen the involved PRs, their performance on the task is still under 20%. We can worry about leakage when we start succeeding at extracting task-related knowledge out of the model.
SWE-bench is probably contaminated for frontier models (gpt-4/claude-3-opus). Given only the name of a pull request in the dataset, Claude-3-opus already knows the correct function to modify.
The folks at @OpenAI and @ericschmidt were kind enough to give @AdtRaghunathan and me a generous gift to better understand supervision with weak models. We are honored to be awarded, and are looking forward to the exciting work that will come out of this !
Looking for top engineers and designers passionate about harnessing our AI capabilities to create never-before-seen consumer products. 🛼 come roll w us! x.ai/careers
Some early results of our first vision model. It'll be integrated into the Grok chat in the medium term. A few other features will ship before that (likely very soon). Props to {@tingchenai, @gabriel_ilharco}. x.ai/blog/grok-1.5v
excited to share that ive joined @xai! its only been 2 weeks, but the team is insanely stacked and the rate of progress is astounding. 📈📈 im looking forward to learning a lot and sharing everything i know about data and post-training with the team 🥳