Xuezhe Ma (Max) @MaxMa1987
Research Lead @USC_ISI and Research Assistant Professor @CSatUSC PhD at CMU ML/NLP @LTIatCMU @CarnegieMellon xuezhemax.github.io Pittsburgh, PA Joined June 2015-
Tweets200
-
Followers1K
-
Following350
-
Likes745
There is a really nice community of researchers developing transformer alternatives. Want to highlight these impressive folks. Simran Arora (@simran_s_arora), Chunting Zhou (@violet_zct), Dan Fu (@realDanFu), and Songlin Yang (@SonglinYang4)
@WenhuChen @Teknium1 I think the motivation of LIMA is not to quantify the number of SFT examples that is needed but to highlight (1) how important high quality SFT data is and (2) the superficial alignment hypothesis where pretrained LLM stores all the knowledge and can be easily tuned into an…
@Teknium1 LIMA never claimed 10k is "best you can do" specifcally, "diminishing returns when scaling up data quantity 𝘸𝘪𝘵𝘩𝘰𝘶𝘵 𝘢𝘭𝘴𝘰 𝘴𝘤𝘢𝘭𝘪𝘯𝘨 𝘶𝘱 𝘱𝘳𝘰𝘮𝘱𝘵 𝘥𝘪𝘷𝘦𝘳𝘴𝘪𝘵𝘺, 𝘢𝘭𝘰𝘯𝘨𝘴𝘪𝘥𝘦 [..] 𝘨𝘢𝘪𝘯𝘴 𝘸𝘩𝘦𝘯 𝘰𝘱𝘵𝘪𝘮𝘪𝘻𝘪𝘯𝘨 𝘥𝘢𝘵𝘢 𝘲𝘶𝘢𝘭𝘪𝘵𝘺"
Cannot understand why u want to attack an artificial claim created by yourself🥴. To u, "10K poor data loses to 1k high-quality data" is equal to "10K is the best you can do"?
Cannot understand why u want to attack an artificial claim created by yourself🥴. To u, "10K poor data loses to 1k high-quality data" is equal to "10K is the best you can do"?
Checkout Megalodon: a new alternative architecture of transformers: - head-by-head comparison at the scale of 7B and 2T tokens showing lower ppl - unlimited ctx len - constant KV cache at inference Exciting work by @MaxMa1987 @violet_zct @_xiaomengy_ Ckpts available soon!
Checkout Megalodon: a new alternative architecture of transformers: - head-by-head comparison at the scale of 7B and 2T tokens showing lower ppl - unlimited ctx len - constant KV cache at inference Exciting work by @MaxMa1987 @violet_zct @_xiaomengy_ Ckpts available soon!
This is the first time we see a new architecture making🍎to🍎 comparison at scale with Llama-7B trained on the same 2T tokens and win (unlimited context length, lower ppl, constant kv at inference, ...)! Very excited to be part of the team! Thanks for the lead @violet_zct…
This is the first time we see a new architecture making🍎to🍎 comparison at scale with Llama-7B trained on the same 2T tokens and win (unlimited context length, lower ppl, constant kv at inference, ...)! Very excited to be part of the team! Thanks for the lead @violet_zct… https://t.co/HPz810DurZ
How to enjoy the best of both worlds of efficient training (less communication and computation) and inference (constant KV-cache)? We introduce a new efficient architecture for long-context modeling – Megalodon that supports unlimited context length. In a controlled head-to-head…
wow this must feel good when training (from Megalodon arxiv.org/abs/2404.08801 )
Meta announces Megalodon Efficient LLM Pretraining and Inference with Unlimited Context Length The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length abs: arxiv.org/abs/2404.08801 repo: github.com/XuezheMax/mega…
Thanks for sharing our work!
ACL has removed the anonymity period. This means that ACL submissions can be posted and discussed online at any time, although extensive PR is discouraged. aclweb.org/adminwiki/imag…
Just learned despite everyone voting down *CL's 🤡-y arxiv embargo policy, it's still firmly in place for ACL 2024. If *CL were a company, the board & leadership wd be fired, the talent wd've left 5 years ago, the common stock wd be worth $0, & WSB wd be taking an interest.
🚩LLM360 — the 1st fully _open_ LLMs 🔥open weight 🔥open training data 🔥open code: data processing, training, eval 🔥open training trajectory: 360 checkpoints - from token 0 to 1.4T 🔥open analysis Excited to see how this'd fuel research to transparentize LLMs🔍🔍
🚩LLM360 — the 1st fully _open_ LLMs 🔥open weight 🔥open training data 🔥open code: data processing, training, eval 🔥open training trajectory: 360 checkpoints - from token 0 to 1.4T 🔥open analysis Excited to see how this'd fuel research to transparentize LLMs🔍🔍 https://t.co/StU36AYoGC
Huge congrats!
🌟Thrilled to share that our paper "Look-back Decoding for Open-Ended Text Generation" won the Outstanding Paper Award at EMNLP2023! Immense gratitude to anonymous reviewers and to my incredible collaborators @violet_zct , @real_asli and @MaxMa1987. #EMNLP2023
Can LLMs generate exact 5 words? No How about 5 sentences? No How about 5 paragraphs? No 🤷🏻♀️ In arxiv.org/abs/2310.14542, we evaluate the performance of LLMs on various controlled generation tasks including numerical planning, story generation, paraphrase generation, and etc. (1/n)
Strongly support the open source for academic research. a research has to be examined and verified by the public. a project that only claims to "novel" or "best" or "state-of-the-art" but publicly verifiable material is called PR.
Strongly support the open source for academic research. a research has to be examined and verified by the public. a project that only claims to "novel" or "best" or "state-of-the-art" but publicly verifiable material is called PR.
The heretofore silent majority of AI scientists and engineers who - do not believe in AI extinction scenarios or - believe we have agency in making AI powerful, reliable, and safe and - think the best way to do so is through open source AI platforms NEED TO SPEAK UP !
The heretofore silent majority of AI scientists and engineers who - do not believe in AI extinction scenarios or - believe we have agency in making AI powerful, reliable, and safe and - think the best way to do so is through open source AI platforms NEED TO SPEAK UP !
AK @_akhaliq
310K Followers 3K Following AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo follow on Hugging Face: https://t.co/q2Qoey80GxWilliam Wang @WilliamWangNLP
14K Followers 718 Following UCSB NLP Lab + ML Center. https://t.co/6TOnqbk6YT https://t.co/KJYhnav3Et Mellichamp Chair Prof. at UCSB CS. PhD @ CMU SCS. Areas: #NLProc, Machine Learning, AI.Bill Yuchen Lin 🤖 @billyuchenlin
6K Followers 2K Following Research @allen_ai. I evaluate (multi-modal) LLMs, build agents, and study the science of LLMs. Previously: @GoogleAI & @MetaAI FAIR @nlp_uscGraham Neubig @gneubig
31K Followers 588 Following Associate professor at CMU, studying natural language processing and machine learning.Yao Fu @Francis_YAO_
14K Followers 2K Following PhD @EdinburghNLP on LLMs and Machine Reasoning. Ex. @Columbia @PKU1898 @MITIBMLab @allen_ai AGI has yet to come, so keep runningWenhu Chen @WenhuChen
11K Followers 520 Following AI researcher @UWaterloo @GoogleAI @VectorInst. Interested in natural language processing, diffusion models. I direct TIGER-Lab at UWaterloo.Pei Zhou @peizNLP
2K Followers 887 Following PhD @nlp_usc | Ex-@GoogleDeepMind, @GoogleAI, @allen_ai @AmazonScience @UCLA | Common Ground Reasoning for Communicative Agents | he/himKayo Yin @kayo_yin
8K Followers 560 Following PhD student @berkeley_ai @berkeleynlp working on interpretability and signed languages. Former @msftresearch @deepmind @carnegiemellon @polytechnique. 🇫🇷🇯🇵Tao Yu @taoyds
3K Followers 815 Following @XLangNLP lab, asst. prof. @HKUniversity. prev. postdoc @uwnlp; phd @Yale; intern @MSFTResearch, @SFResearch. he/him 🌈Weijia Shi @WeijiaShi2
5K Followers 968 Following PhD student @uwcse @uwnlp | Visiting Researcher @MetaAI | Undergrad @CS_UCLA | https://t.co/eLBQmgkvymJiao Sun @sunjiao123sun_
2K Followers 365 Following PhD Candidate 👩🏻🎓 Amazon ML Fellow in Natural Language Generation (NLG) @USC; | {Google Brain, Alexa AI} nlper | Prev. IIIS @Tsinghua_UniXin Eric Wang @xwang_lk
7K Followers 1K Following Multimodal and Embodied AI Researcher / Professor @UCSC. Director of https://t.co/Y4swOBag21. AI for Humanity in the long run. he/himHeng Ji @hengjinlp
4K Followers 236 FollowingMengzhou Xia @xiamengzhou
3K Followers 621 Following PhD student @princeton_nlp, MS @CarnegieMellon, Undergrad at Fudan.Sean (Xiang) Ren @xiangrenNLP
6K Followers 561 Following Building @SaharaLabsAI | @USCViterbi Early Career Chair, Professor @nlp_usc | @MIT TR 35 , @ForbesUnder30 | Prev: @allen_ai, @Snapchat, @Stanford, @UofIllinoisQinyuan Ye @qinyuan_ye
2K Followers 1K Following 👩💻 Ph.D. student @nlp_usc @CSatUSC @USC_ISI | 🐾 Teaching machines to be more versatile and curious.Shruti Rijhwani @shrutirij
4K Followers 499 Following * Research Scientist @GoogleDeepMind * #NLProc research * PhD from @LTIatCMU * Amateur woodworker, scuba diver, foosball playerSiddharth Dalmia @siddalmia05
1K Followers 445 Following Research Scientist @GoogleDeepmind | #SpeechProc and #NLProc | PhD from @LTIatCMU @SCSatCMU | Ex-intern @GoogleAI, @AWSCloud, @FacebookAIOfir Press @OfirPress
10K Followers 3K Following I build tough benchmarks for LMs and then I get the LMs to solve them. Postdoc @Princeton. PhD from @nlpnoah @UW. Ex-visiting researcher @MetaAI & @MosaicML.EdithWylde @H4LFoFa5tNo1fU8
0 Followers 153 FollowingAhmed Hisham @AhmedHi08078280
0 Followers 50 Followingzrait @zrait
162 Followers 2K FollowingHaonan Wang @HaonanWang97
182 Followers 289 Following CS Ph.D. at National University of Singapore 🇺🇸UIUC-BS done 🇸🇬NUS-PhD doingKuan-Hao Huang @kuanhaoh_
595 Followers 190 Following Postdoc @IllinoisCS / CS PhD @UCLA / Natural Language ProcessingColton Dempsey @coltondempsey
204 Followers 1K Following partner @next47 investing in infrastructure software, developer & data science tools, cybersecurity, SaaS and roboticsTursernue @tursernue96186
0 Followers 167 FollowingFan Zhou @FaZhou_998
177 Followers 395 Following AI Research @sjtu1896 Ex @XLangNLP @HKUniversity @MSFTResearchOpenNLPLab @opennlplab
260 Followers 87 Following OpenNLPLab Official Account Hugging Face: https://t.co/B9IzcQoCQP GitHub: https://t.co/PhoPmAkyf7 WeChat: OpenNLPLabMichi Yasunaga @michiyasunaga
3K Followers 867 Following CS PhD @Stanford working on language models and multimodal models. Previously @Meta @GoogleDeepMind @YaleZhehao Zhang @Zhehao_Zhang123
98 Followers 390 Following Graduate student at @Dartmouthcs ; Visiting Research Intern @SALT_NLP; Prev. Research Intern @MSFTResearch; Formerly undergrad from @sjtu1896; NLP&ML #NLProcemanon @JianSuji
67 Followers 1K FollowingSimon Yu @simon_ycl
87 Followers 351 Following MRes (Thesis-based) student at @EdinburghNLP , Member of @CohereForAI. Ex Research Intern at @Huawei Noah’s Ark Lab, Student at @EdinburghUni, @InfAtEdKhan Bhebhura @Bhebhurakhan
77 Followers 701 Following Passionate nerd🔭 | AI consultant | founder and CEO of CloudyAI the tech startup democratizing access to quality healthcare in Africa using AI 💻Adina Yakup @AdeenaY8
3K Followers 466 Following @huggingface 🤗 | Contributing to Chinese ML community.今夜无眠 @Airwalker2020
6 Followers 105 FollowingLanxiang Hu @Lanxiang_Hu
33 Followers 160 Following PhD Student @UCSDJacobs. AI & Systems. Prev. @UCBerkeley.Migel Tissera @migtissera
3K Followers 213 Following Co-founder, @metaspectral_ and @WhiteRabbitNeos HuggingFace: https://t.co/sE0IQJLLsd PhD in Deep LearningAbdallah Arioua @AbdallahAriooua
127 Followers 901 Following Chief Data and AI Officer, PhD in AI. Opinions are mine.Nishant Subramani @nsubramani23
579 Followers 2K Following PhD student at @LTIatCMU // Prev: Predoctoral Researcher at @allen_ai in #NLProc // @BVB supporter // he/himKian Ahrabian @kahrabian
56 Followers 251 Following Computer Science Ph.D. Student @USC/@USC_ISI, ex-@mcgillu ex-@Mila_Quebec | Knowledge Graphs, Representation Learning, ML on GraphsNick Lashinsky @lashinskynick
338 Followers 2K Followingliuyong @forrestbing
263 Followers 5K Following I am a researcher in AIGC, Multi-modality and VitrualHuman tech directionPraveen Sridhar @psbots
896 Followers 4K Following Machine Learning Engineer whose life revolves around Music, Books and Technology. Co-founder of @tinkerhub, a non-profit educational initiative.songyq @songyq4
25 Followers 148 FollowingShitian Zhao @zst96687522
9 Followers 313 Following Senior Undergrad, ECNU @ECNUER Previous Intern @ CCVL @JohnsHopkins Intern @ Shanghai AI Labcarbon.rethink @CarbonRethink
1 Followers 16 Followingmann @punsbymann
105 Followers 347 Following ML @google | ms cs/ai @USC | geometric deep learning, language modelling, all things computeYi Gu @YiGu025
37 Followers 45 FollowingEva Louise Marie Gabr.. @e681554349
9 Followers 3K FollowingMojtaba Vàlipour @ValipourMojtaba
388 Followers 3K Following CS PhD at @UWaterloo, Founding Engineer at Coastal Carbon and part time Researcher at @huawei Noah’s Arc Lab, prev enjoyed my time at @oraclelabs, & @CVC_UAB温仕程 @Shicheng_Wen
2 Followers 45 Followinghastin @hhhastin259
3 Followers 13 FollowingJeovane H. Alves @jeohalves
5 Followers 462 Following PhD in Computer Science Research Associate, SEDAN, SnT, University of LuxembourgHongyi Wang @HongyiWang10
1K Followers 1K Following Senior Project Scientist @mldcmu @CarnegieMellon; MLSys researcher; Member @llm360; Ph.D. @WisconsinCS; On the academic job market NOW!宣化 @YuanHua77288
20 Followers 173 FollowingAK @_akhaliq
310K Followers 3K Following AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo follow on Hugging Face: https://t.co/q2Qoey80GxWilliam Wang @WilliamWangNLP
14K Followers 718 Following UCSB NLP Lab + ML Center. https://t.co/6TOnqbk6YT https://t.co/KJYhnav3Et Mellichamp Chair Prof. at UCSB CS. PhD @ CMU SCS. Areas: #NLProc, Machine Learning, AI.(((ل()(ل() 'yoav))).. @yoavgo
46K Followers 2K FollowingPercy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistYann LeCun @ylecun
712K Followers 719 Following Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.Graham Neubig @gneubig
31K Followers 588 Following Associate professor at CMU, studying natural language processing and machine learning.AI at Meta @AIatMeta
532K Followers 255 Following Together with the AI community, we are pushing the boundaries of what’s possible through open science to create a more connected world.Yi Tay @YiTayML
29K Followers 97 Following chief scientist / cofounder @RekaAILabs 🫠 past: research scientist @google brain 🤯 currently learning to be a dad 🍼Sasha Rush @srush_nlp
52K Followers 464 Following Professor, Programmer in NYC. Cornell Tech, Hugging Face 🤗 https://t.co/cZl0wTfqGzJacob Andreas @jacobandreas
14K Followers 958 Following Teaching computers to read. Assoc. prof @MITEECS / @MIT_CSAIL (he/him). https://t.co/5kCnXHjtlY https://t.co/2A3qF5vdJwYoav Artzi @yoavartzi
13K Followers 162 Following Research/prof @cs_cornell + @cornell_tech🚡 / https://t.co/9YnWry7yHs / https://t.co/3VmRSyYm2d / asso. faculty director @arxiv / building https://t.co/f9QkzO5kaCWenhu Chen @WenhuChen
11K Followers 520 Following AI researcher @UWaterloo @GoogleAI @VectorInst. Interested in natural language processing, diffusion models. I direct TIGER-Lab at UWaterloo.Kyunghyun Cho @kchonyc
61K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).Richard Socher @RichardSocher
101K Followers 971 Following CEO @youSearchEngine Investing at @aixventuresHQ Before: Stanford Adj Prof in AI/NLP, Chief Scientist at Salesforce, MetaMindAllen Institute for A.. @allen_ai
54K Followers 361 Following AI for the Common Good. › Join us: https://t.co/DqTs1G4bGO › Get our newsletter: https://t.co/tvb1VpySfLKayo Yin @kayo_yin
8K Followers 560 Following PhD student @berkeley_ai @berkeleynlp working on interpretability and signed languages. Former @msftresearch @deepmind @carnegiemellon @polytechnique. 🇫🇷🇯🇵Tao Yu @taoyds
3K Followers 815 Following @XLangNLP lab, asst. prof. @HKUniversity. prev. postdoc @uwnlp; phd @Yale; intern @MSFTResearch, @SFResearch. he/him 🌈OpenNLPLab @opennlplab
260 Followers 87 Following OpenNLPLab Official Account Hugging Face: https://t.co/B9IzcQoCQP GitHub: https://t.co/PhoPmAkyf7 WeChat: OpenNLPLabRobert Scoble @Scobleizer
504K Followers 68K Following Follow me on my new podcast with AI startups, Unaligned. Tech industry color commentator since 1993. Author/Blogger. Former strategist @Microsoft.Fei Liu @feiliu_nlp
690 Followers 290 Following Associate professor @EmoryUniversity. Working on large language models, automatic summarization, natural language generation, and various aspects of AI.Saining Xie @sainingxie
14K Followers 1K Following researcher in #deeplearning #computervision | assistant professor at @NYU_Courant @nyuniversity | previous: research scientist @metaai (FAIR) @UCSanDiegoSherry Tongshuang Wu @tongshuangwu
5K Followers 1K Following Assist. Prof @SCSatCMU , CS PhD @uwcse. HCI+AI, map general-purpose models to specific use cases! prev. intern @MSFTResearch @GoogleAI @Apple. She/her.JohnSnowLabs @JohnSnowLabs
41K Followers 30K Following Helping healthcare and life science organizations put AI to work faster with state-of-the-art LLM & NLP.Michael Zhang @mzhangio
1K Followers 426 Following CS PhD Student @hazyresearch, @StanfordAILab. Robustness. Foundations of foundation models. Want to make them less shaky.Yueqi Song @yueqi_song
63 Followers 47 Following Undergraduate student at Carnegie Mellon University. Interested in Multimodal and Multilingual NLP, advised by Professor Graham Neubig. Applying to Fall24 PhD.Hexiang (Frank) Hu @Hexiang_Hu
505 Followers 401 Following Research Scientist @GoogleDeepmind | Vision & Language | GeminiTri Dao @tri_dao
19K Followers 365 Following Incoming Asst. Prof @PrincetonCS, Chief Scientist @togethercompute. Machine learning & systems.Weijia Shi @WeijiaShi2
5K Followers 968 Following PhD student @uwcse @uwnlp | Visiting Researcher @MetaAI | Undergrad @CS_UCLA | https://t.co/eLBQmgkvymChenlin Meng @chenlin_meng
8K Followers 833 Following Co-founder & CTO @pika_labs | ex @StanfordAILab @StanfordPika @pika_labs
116K Followers 53 Following Video on command. Website: https://t.co/G5bjmrMQsx Discord: https://t.co/bX68ThPTQH About: https://t.co/atvdcgbe9SPengfei Liu @stefan_fee
2K Followers 616 Following Associate Prof. at SJTU, leading GAIR Lab (https://t.co/Nfd8KmZx3B) Co-founder of Inspired Cognition, Postdoc at @LTIatCMU, Previously FNLP, @MILAMontreal,Huijuan Wang @lemonnoel52
2 Followers 11 FollowingDacheng Li @DachengLi177
621 Followers 476 Following Intelligence. PhD @Berkeley_EECS @lmsysorg @ucbrise @berkeley_ai, Prev. @Google @SCSatCMU.Yixin Wan @yixin_wan_
1K Followers 847 Following PhD student @UCLAComSci | Trustworthy Generative Models | Previously @AmazonScience, @MSFTResearch AsiaZhiyu Zoey Chen @ZhiyuChen4
1K Followers 303 Following NLP researcher. Postdoc @S3DatCMU. Incoming Assistant Professor @UT_Dallas. PhD @UCSBCS. #NLProc.Loubna Ben Allal @LoubnaBenAllal1
4K Followers 622 Following ML Engineer @huggingface 🤗 | @ENS_ParisSaclay - MVAChip Huyen @chipro
92K Followers 444 Following Data processing on GPUs @VoltronData Designing ML Systems: https://t.co/G81hL2dWmr @designmlsys #AI x #GPURulin Shao @RulinShao
615 Followers 396 Following PhD @UWNLP | MS @SCSatCMU | ex-Applied Scientist @AWSHao Zhang @haozhangml
3K Followers 263 Following Asst. Prof. @HDSIUCSD and @ucsd_cse running @haoailab. Cofounder and runs @lmsysorg. 20% with @SnowflakeDBAlexis Conneau @alex_conneau
24K Followers 113 Following Audio AGI Research Lead @OpenAI - GPT-Next - Past: XLM, Unsupervised ASR, Unsupervised MT, Wav2vec 2.0/XLSR, MUSE, Unsupervised cross-lingual transferYu Yang @YUYANG_UCLA
1K Followers 557 Following PhD Candidate @UCLAComSci 🧸 | Amazon Fellow | Prev @AIatMeta @MSFTResearch @AmazonScience @uclamath | Improving data for efficiency, robustness and performanceSebastian Raschka @rasbt
267K Followers 906 Following Machine learning & AI researcher writing at https://t.co/A0tXWzG1p5. LLM research engineer @LightningAI. Previously stats professor at UW-Madison.Eric Alcaide @eric_alcaide
769 Followers 425 Following Physics, Medicine, Machine Learning || Universe, Life, Intelligence • From Bits to MoleculesYue Zhao @yzhao062
3K Followers 508 Following Assistant Professor @CSatUSC @USCViterbi. Ph.D. @CarnegieMellon. I build open ML systems and tools for all: Anomaly Detection, ML Systems, AutoML, data miningJieyu Zhao @jieyuzhao11
2K Followers 642 Following Assistant Prof. @CSatUSC, @USC || Postdoc @ClipUMD || PhD from @UCLANLP, @UCLA. #NLP, #ML, #Fairness, #TrustworthyNLPJunru Shao @junrushao
2K Followers 426 Following PMC @ApacheTVM | Past: @awscloud, @mldcmu, ACM’17 @sjtu1896 | Opinions are my ownZiyi Liu @liuziyi93
326 Followers 191 Following First year PhD student in USC; Master Student in USCAmazon Science @AmazonScience
93K Followers 2K Following The latest news and research from Amazon's science community. #AmazonScienceKawin Ethayarajh @ethayarajh
3K Followers 728 Following PhD student @StanfordAILab @stanfordnlp Working on machine learning under human incentives.Siyan Sylvia Li ✨ @Sylvia_Sparkle
1K Followers 503 Following 1st year PhD @columbianlp • Prev @stanfordnlp @GeorgiaTech • Weird Little Guy Academic • NLP, Dialogue Systems • Caffeine GremlinSander Dieleman @sedielem
50K Followers 2K Following Research Scientist at Google DeepMind. I tweet about deep learning (research + software), music, generative models (personal account).Junhong Shen @JunhongShen1
283 Followers 195 Following PhD Student @mldcmu | BS @UCLA | Interned @MSFTResearch @DeterminedAISelina Wang @selinawangtv
86K Followers 3K Following Senior White House Correspondent for @ABC News Former CNN Correspondent & Bloomberg TV corr/anchor in Beijing, Tokyo, NYC, SF. Instagram @selinawangtv 📸Alon Albalak @AlbalakAlon
887 Followers 465 Following CS PhD candidate at @ucsbNLP. Research: Data-centric AI, Efficiency in ML, NLP.Microsoft presents FILM Make Your LLM Fully Utilize the Context While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge.
PhDone!!!! 👨🎓 08/2019-04/2024 What a journey 🥳🚞 I especially feel lucky to share this once-in-a-life-time moment with people I love ❤️ . And seeing my passion-driven research efforts being acknowledged by researchers I deeply admire 🌞!! Special thanks to my awesome committee…
Couldn't have asked for a better birthday gift: just received news that I've been appointed as a Research Assistant Professor in the USC Dept. of Computer Science @CSatUSC, starting July 24! Immensely grateful to my mentors, family, and friends for their support in this journey❤️
There is a really nice community of researchers developing transformer alternatives. Want to highlight these impressive folks. Simran Arora (@simran_s_arora), Chunting Zhou (@violet_zct), Dan Fu (@realDanFu), and Songlin Yang (@SonglinYang4)
Phi 3 (3.8B) got released! The paper said it was just a Llama arch, but I found some quirks while adding this to @UnslothAI: 1. Sliding window of 2047? Mistral v1 4096. So does Phi mini have SWA? (And odd num?) Max RoPE position is 4096? 2. Upcasted RoPE? Like Gemma? 3. Dynamic…
LLM Reasoners have been widely used (800 GitHub stars!🌟) in many projects since its initial release. Now v1.0 is out -- new _algorithms_, systematic _evaluation_, and _visualization_! 🤩 More SOTA reasoning algorithms are on the way, including LLM reasoning for scientific…
Releasing 🔥LLM Reasoners v1.0🔥 🥇Popular library for advanced LLM reasoning - Reasoning-via-Planning (RAP)🎶 - Chain-of-Thoughts (CoT)⛓️ - Tree-of-Thoughts (ToT)🌴 - Grace decoding💄 - Beam search🔎 🥇Enhances #Llama3, GPT4, LLMs on @huggingface llm-reasoners.net
Will be starting as a research intern @MSFTResearch next month. Excited to work on multimodal alignment and agents! The best part yet? I get to spend time with fam and our pup Doodle (yeah that's his name :-) in Bellevue!
@Teknium1 Finetuning is steering the model’s knowledge. You can do that with few or many samples, it’s just that if you use a large dataset; then you need to ensure the quality is there
@Teknium1 Fine-tuning for 10M annotated is no longer fine-tuning, it's approaching continual training IMO. Fine-tuning is not different from pretraining, the only difference is the size, so while small scale fine tuning doesn't add new knowledge (due to weak signals), large-scale SFT does.
@Teknium1 Actually, in talks and discussions about LIMA we usually advocate for scaling up post-training, but to keep an eye on quality and diversity as you do.
@Teknium1 But has there ever been a case where instruct fine tuning truely meaningfully change the ELO rating? I mean, if you just look at the chatbot arena clearly capabilities are bounded via pretraining stage, even though superficially they have similar "chat vibe"
@Teknium1 neither of these invalidate what I'm saying re: figure -> you are missing the following from the text
@Teknium1 damn, thanks for shielding me from the retards king 🫡
@Teknium1 i don't know what a simplistic take-away is i read the whole paper and never got the impression it is suggesting you hurt perf by scaling ft if someone doesn't read the paper then of course they can come up with any take-away possible
@Teknium1 no, LIMA explicitly stated the value is in data quality - 10k poor training examples will lose to 1k high quality examples that, and you can achieve good performance w/ low # samples it never recommended not using as much data as you have available (as long as it's diverse data)
@Teknium1 LIMA never claimed 10k is "best you can do" specifcally, "diminishing returns when scaling up data quantity 𝘸𝘪𝘵𝘩𝘰𝘶𝘵 𝘢𝘭𝘴𝘰 𝘴𝘤𝘢𝘭𝘪𝘯𝘨 𝘶𝘱 𝘱𝘳𝘰𝘮𝘱𝘵 𝘥𝘪𝘷𝘦𝘳𝘴𝘪𝘵𝘺, 𝘢𝘭𝘰𝘯𝘨𝘴𝘪𝘥𝘦 [..] 𝘨𝘢𝘪𝘯𝘴 𝘸𝘩𝘦𝘯 𝘰𝘱𝘵𝘪𝘮𝘪𝘻𝘪𝘯𝘨 𝘥𝘢𝘵𝘢 𝘲𝘶𝘢𝘭𝘪𝘵𝘺"
@WenhuChen @Teknium1 I think the motivation of LIMA is not to quantify the number of SFT examples that is needed but to highlight (1) how important high quality SFT data is and (2) the superficial alignment hypothesis where pretrained LLM stores all the knowledge and can be easily tuned into an…
Deploy #Llama3 locally with native GPU acceleration on CUDA/ROCm/Vulkan/Metal with MLC LLM. Check out llm.mlc.ai/docs/ for quick start instructions.
Handling data at scale always presents edge cases... In preparing WildChat-1M, besides Moderation issues↓, we found a curse word repeated thousands of times w/o spaces, causing the Presidio analyzer in PII removal to hang. Stay tuned for the upcoming release of WildChat-1M!
Update on Moderation API issue: length errors seem to link to non-Latin characters. E.g., Moderation can handle 1M Latin characters but fails for a few K non-Latin characters on WildChat (Korean, Chinese, etc) Code for reproducing the err & a workaround: community.openai.com/t/moderation-r…