Xinyang (Young) Geng @younggeng
Research scientist at Google DeepMind young-geng.xyz Joined February 2014-
Tweets32
-
Followers799
-
Following461
-
Likes142
Introduce LightSeq for long-context LLM training: - Highly optimized for decoder models - smarter checkpointing - better support for fewer heads models up to 2x faster, 2-8x longer sequences vs Megatron-LM. arxiv.org/abs/2310.03294
New paper w/ @matei_zaharia @pabbeel on transformers with large context size. We propose RingAttention, which allows training sequences that are device count times longer than those of prior state-of-the-arts, without attention approximations or incurring additional overhead.
🎉Excited to share a fun little hardware project we’ve been working on. GELLO is an intuitive and low cost teleoperation device for robot arms that costs less than $300. We've seen the importance of data quality in imitation learning. Our goal is to make this more accessible 1/n
μP allows you to keep the same hyperparameters as you scale up your transformer model. No more hyperparameter tuning at large size! 🪄 It saves millions of $ for very large models. It’s easier to implement than it seems: You have to 1. Keep the initialization and learning rate…
μP allows you to keep the same hyperparameters as you scale up your transformer model. No more hyperparameter tuning at large size! 🪄 It saves millions of $ for very large models. It’s easier to implement than it seems: You have to 1. Keep the initialization and learning rate… https://t.co/eFDiCtPnUx
Seeing people struggling with FSDP… That's exactly where JAX shines, I can use pretty much any parallelism strategy with these few lines 💪
Aviral is one of the best collaborators I've worked with. For prospective students interested in RL and decision making, I'd strongly recommend him as an advisor.
Aviral is one of the best collaborators I've worked with. For prospective students interested in RL and decision making, I'd strongly recommend him as an advisor.
Check out our recent work! We find that recent models that imitate ChatGPT — like Alpaca, Vicuna, Koala — largely learn ChatGPT’s style and less so its capabilities/factuality. And that base model quality can be a highly effective lever for improving on factuality.
Check out our recent work! We find that recent models that imitate ChatGPT — like Alpaca, Vicuna, Koala — largely learn ChatGPT’s style and less so its capabilities/factuality. And that base model quality can be a highly effective lever for improving on factuality.
Evaluating LLMs is notoriously difficult, and academic benchmarks may fail. Inspired by chess and MOBA games, we are taking a new approach by calculating Elo ratings of models with crowdsourced battle data. - Blog: lmsys.org/blog/2023-05-0… - Leaderboard: leaderboard.lmsys.org
As a part of our effort to replicate LLaMA in an open-source manner, we are pleased to announce the release of preview of the 7B OpenLLaMA model that has been trained with 200 billion tokens on the RedPajama dataset. github.com/openlm-researc…
Berkley just released Koala-13B! The open-source chatbot was trained by fine-tuning LLaMA on web dialogue! 50% of responses are similar to ChatGPT. The paper also suggests that training with high-quality datasets can compensate for smaller models, possibly matching larger ones
Check out Koala 🐨: a new chatbot from BAIR researchers fine-tuned on dialogue that approaches ChatGPT quality! Work led by @haoliuhl @Eric_Wallace_ @ArnavGudibande and Xinyang Geng Blog: bair.berkeley.edu/blog/2023/04/0… Demo: koala.lmsys.org
Prime Intellect @PrimeIntellect
9K Followers 424 Following Find compute. Train models. Co-own intelligence. https://t.co/3NC0duKF4a.Amirmmi @Amirmmi1998
1 Followers 81 FollowingPhil Greene @vtguy65
787 Followers 3K Following 🚀 Journeying towards financial freedom & sharing the map as I chart it. ✨ Free digital gems weekly. 🎨 Web & graphic designer, Print on Demand.Stefan Juang @StefanJuang
169 Followers 2K Following I am an AI and Machine Learning Engineer specializing in Game Theory and Reinforcement Learning, holding an MPhil in Computer Science from HKUST.Norbert Biedermann @ .. @NJBiedermann
602 Followers 3K Following VIsionary - Expert (@LinkedIn) - Online Research ProfessionalMSS @sajwan_mellow
22 Followers 385 FollowingMikkel @Mikkel86881951
425 Followers 2K FollowingÁlvaro @hilvanado
6 Followers 84 FollowingAaditya ; @Aaditya26082004
547 Followers 7K Following CS'26 • Machine Learning • Open-Source • Web Dev. • Algorithms • Jai Shree Krishna 🦚🪈KB @katiebowles_
642 Followers 5K Following Advancing AI for Healthcare at Scale at @AbridgeHQ | $150M Series C 🚀 | We're Hiring!aVerity @AVerityjane
4 Followers 144 FollowingEva Louise Marie Gabr.. @e681554349
9 Followers 3K FollowingHinePo @Hine__Po
193 Followers 440 Following Head of AI & Data. Data science tech lead. Chemical engineer. Kaggle Competitions Expert (top 1%).Richard Gibbons @RichardGibbonsX
388 Followers 1K Following Founder @DigitalApplied. Digital Marketing & Transformation | AI | SEO | PPC | Social Media | Web Development | eCommerce | Automation | CRM & Analytics.Peter Morales @PeterMoralesX
223 Followers 2K Following Founder of funded Stealth AI Startup. Interested in AI development at the edge? DM.زِرِنگ @premature79
402 Followers 919 FollowingRedie @rediejarvis
14 Followers 166 FollowingEthan (Yixing) Jiang @ethanjyx
121 Followers 302 Following Early engineer at https://t.co/n7q2e0bpY4, prev @CovariantAI @facebookma @ma52987379
0 Followers 120 FollowingMIke @MIke71530700
1 Followers 100 FollowingJoe Fredrick @fredric11642
5 Followers 54 FollowingSyed Amaan @syedamaann
362 Followers 4K Following exited founder, cs undergrad. I oscillate between ai research and real-world aiAhad Jawaid @ahadj0
14 Followers 103 Following CS undergrad at UTD. Interested in Generative Models and Autonomous Decision Making. Interned at @alexa99TommyTang @Tommy_Tang_930
23 Followers 223 FollowingOle Jonas @friendly_tweedy
54 Followers 500 FollowingJishuai MIAO @JishuaiM88686
23 Followers 583 Followingshubham maheshwari @here_for_papers
10 Followers 103 FollowingArif Ahmad @arif_ahmad_py
307 Followers 7K Following All things AI, Computer Science and Circuits! Prev. @GoogleAIJack Reacher @JackReach516
77 Followers 1K FollowingAllan @dbsynergy
193 Followers 1K FollowingAVINASH ANAND @avin_anand
20 Followers 420 FollowingGrownBreeze @bray_R
97 Followers 542 Following Product Manager@Daimler || People Observer || Mind Traveler|World CitizenVinay Ahuja @vinayah
175 Followers 2K Following Passionate about Gen AI, Search, Advertising, Mobile, Creator economyElliot Luchansky @ElliotLuchansky
1K Followers 557 Following Elliot Luchansky, CFA: executive leader, expert in talent attraction, board advisory, and business optimization. Currently with CyberNova, MSP-focused PENazarsky @nzrsky
85 Followers 528 Following 📱 15+ years iOS ninja | 🤖 AI & ML enthusiast | ✨ Crafting digital magic for 500M+ usersChaos Song @song_chaos2243
201 Followers 4K FollowingAICurrent.ai @AIcurrent_ai
24 Followers 279 Followingシェイン・グウ @shanegJP
53K Followers 351 Following https://t.co/yYd252xC4w Gemini 1.5 Pro @GoogleDeepMind 東京・SF。 元@GoogleAI Brain、元@OpenAI。 英語: @shaneguML。全て個人意見です。Delip Rao e/σ @deliprao
46K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈Kefan XIAO @KevinKiao
194 Followers 234 Following Olympic weightlift AI - Pretraining&data of Palm2, Gemini and more.Trevor Gale @Tgale96
1K Followers 250 Following Research Scientist @ Google DeepMind | PhD Candidate @ Stanford CSyi 🦛 @agihippo
3K Followers 81 Following secondary account, hardcore fans only. friend of @agikoala the great researcher, main account: @yitayml warning: hot takes.Sholto Douglas @_sholtodouglas
15K Followers 861 Following Scaling Gemini @Deepmind - working towards intelligence too cheap to meteralewkowycz @alewkowycz
3K Followers 174 Following Member of Technical Staff at @inflectionAI. Former Research Scientist @Google. In a previous life, I did String Theory. Language models and Conversational AI.Sharad Vikram @sharadvikram
1K Followers 510 Following Researcher @ Google Deepmind. I work on JAX + Pallas (https://t.co/lPMsq3yzgL) and Gemini. In the past I worked on Oryx and TFP. I like learning.Hao AI Lab @haoailab
366 Followers 137 Following Hao AI Lab at UCSD. Our mission is to democratize large machine learning models, algorithms, and their underlying systems.Brian Ichter @brian_ichter
2K Followers 178 Following Research Scientist at Google Brain, interested in robotics and AIQuan Vuong @QuanVng
2K Followers 234 Following Robotics Research at @Physical_int, ex-@GoogleDeepMind Perpetually trying to find a quiet place to read.Physical Intelligence @physical_int
4K Followers 8 Following Physical Intelligence (Pi), bringing AI into the physical world.Rosanne Liu @savvyRL
33K Followers 969 Following Cofounded & running @ml_collective. Host of Deep Learning Classics & Trends. Research at Google DeepMind. DEI/DIA Chair of ICLR & NeurIPS. Writing https://t.co/IbycyGfnDREnrique Piqueras @epiqueras1
2K Followers 234 Following Organizing the world's information and making it universally accessible and useful using JAX @Google @Deepmind.Jessy Lin @realJessyLin
2K Followers 728 Following PhD @Berkeley_AI | interactive language agents 🤖 💬Archit Sharma @archit_sharma97
4K Followers 340 Following Final-year CS PhD student @Stanford. Previously, AI Resident @Google Brain, undergraduate @IITKanpur, research intern @MILAMontreal.Yejin Choi @YejinChoinka
19K Followers 330 Following professor at UW, director at AI2, adventurer at heartThang Luong @lmthang
20K Followers 100 Following Senior staff scientist @GoogleDeepMind. PhD @StanfordNLP. PI #AlphaGeometry. Co-lead #Bard Multimodality, now #Gemini. Co-founder #MeenaBot (later LaMDA).Stella Biderman @BlancheMinerva
15K Followers 749 Following Open source LLMs and interpretability research at @BoozAllen and @AiEleuther. My employers disown my tweets. She/herJiahui Yu @jhyuxm
2K Followers 777 Following Member of Technical Staff @OpenAI; previously Research Scientist at Google Brain/DeepMind.Manan Tomar @manan_tomar
299 Followers 512 Following PhD student @rlai_lab UAlberta, and @MSFTResearch. Currently visitor @berkeley_ai. Previously @MetaAI, @iitmadras. Opinions, if you find any, are my dog’s.Yao Fu @Francis_YAO_
14K Followers 2K Following PhD @EdinburghNLP on LLMs and Machine Reasoning. Ex. @Columbia @PKU1898 @MITIBMLab @allen_ai AGI has yet to come, so keep runningthe tiny corp @__tinygrad__
33K Followers 61 Following We make tinygrad. Our mission is to commoditize the petaflop.Jerry Tworek @MillionInt
7K Followers 284 Following I teach programs how to program @ OpenAI | putting the ball in the damn hoop - @jacobmenickJelani Nelson @minilek
22K Followers 184 Following Professor @Berkeley_EECS. Research Scientist (part-time) @GoogleAI. Founder @addiscoder. 🇻🇮🇺🇸🇪🇹Hongyu Ren @ren_hongyu
3K Followers 595 Following Research Scientist @openai. CS PhD @stanford. Previously @apple, @googleai and @nvidiaai. I train language models.Brydon Eastman @brhydon
878 Followers 729 Following Mathematician (Heavy on the ish) Research Scientist @OpenAI, Previously Ph.D. @WaterlooMath. ☕ //🚴//🧗♂️ // 🤔➡️💻Pranav Shyam @recurseparadox
1K Followers 450 Following Research Scientist @DeepMind; ಕನ್ನಡಿಗ. Past: @OpenAI, @SchmidhuberAITao Xu @txhf
6K Followers 890 Following Learning Machine at OpenAI, previously Airbnb, Quora, Facebook and Microsoft.Mark Chen @markchen90
10K Followers 246 Following Head of Frontiers Research at OpenAI. Coach for the USA IOI Team.Mohit Shridhar @mohito1905
1K Followers 1K Following Research Scientist at @Dyson. @uwcse PhD in Robotics.Clémentine Fourrier .. @clefourrier
3K Followers 307 Following Leaderboards & evals research @HuggingFace 🐍✨ "The future is already here, it’s just not very evenly distributed" (Gibson)Sasha Rush @srush_nlp
52K Followers 465 Following Professor, Programmer in NYC. Cornell Tech, Hugging Face 🤗 https://t.co/cZl0wTfqGzModular @Modular
18K Followers 2 Following The future of AI development starts here. Sign up to our 📪 Newsletter → https://t.co/gpuHGRyHTs. We are hiring → https://t.co/cPTAes0HMt 🚀rohan anil @_arohan_
12K Followers 2K Following Principal Engineer, @GoogleDeepMind Gemini. prev PaLM-2. Tinkering with optimization and distributed systems. opinions are my own.Youngwoon Lee @YoungwoonLee
387 Followers 82 Following Assistant Professor at Yonsei | Postdoc @UCBerkeley with @pabbeel | PhD @USC with @JosephLim_AI | Reinforcement Learning and Robot LearningEmma Brunskill @EmmaBrunskill
7K Followers 91 Following Associate professor, Computer Science. Stanford. Stanford's Human Centered AI (HAI) Institute. Opinions expressed are my own.Alex Spiridonov @alexknowsai
282 Followers 88 Following Group Product Manager, @Google Cloud TPU | Prev. GPM @Google Brain (now @GoogleDeepMind) | Building planet-scale AI/ML supercomputers | Investor @SeaChangeVC> phi-3 claims: better than mixtral 8x7B on benchmarks > phi-3 reality: worse than mistral 7b on lmsys you cannot cheat the scaling gods. very exciting 49 place. 🥲
Sorry but this is actually Top 3 benchmarks to *not" use.
Agree. Here are the top three LLM benchmarks I would recommend: 1. Open LLM leaderboard 2. MT-Bench 3. AlpacaEval
Be skeptical, think from first principles, avoid the hype, keep building.
This take on the FineWeb release is one of the most interesting feedback and also a reason FineWeb is very different from even larger datasets like RedPajama-V2 (which is double its size!) Surprisingly, the size of the dataset of 15T tokens is not very important, what is much…
People seem to over-index on the 15T number after Llama 3. While the number matters, what is even more important is the quality and diversity of those tokens. If there was a good way to measure those, that would have been an impressive result to report.
I know you all are tired of me shilling open source and open weights, but read this thread from a computer scientist who has worked on antitrust. It's not just that closed model orgs are closed, but they are perniciously peddling misinformation.
I'm so tired of being in rooms where people whisper about the absolute ARMY of Big Tech-funded people (most, but not all, ex-Googlers) that have popped up in nearly every corridor in DC where people are working on literally anything to do with AI. So let's talk about it! 1/12
I'm so tired of being in rooms where people whisper about the absolute ARMY of Big Tech-funded people (most, but not all, ex-Googlers) that have popped up in nearly every corridor in DC where people are working on literally anything to do with AI. So let's talk about it! 1/12
Data is all we need! 👑 Not only since Llama 3 have we known that data is all we need. Excited to share 🍷 FineWeb, a 15T token open-source dataset! Fineweb is a deduplicated English web dataset derived from CommonCrawl created at @huggingface! 🌐 TL;DR: 🌐 15T tokens of cleaned…
@YiTayML Incredible achievement for a team of 20! Congrats to the amazing team! 🚀
sitting in a taqueria listening to a group of guys excitedly talk about how good the new gpt-4 model is on lmsys while i'm re-reading my H-1B rejection email in one tab and paying US taxes in the other
Our improved model in the arena at lmsys and we’ve rolled out to ChatGPT users today — stay tuned for better versions to come
🔥Exciting news -- GPT-4-Turbo has just reclaimed the No. 1 spot on the Arena leaderboard again! Woah! We collect over 8K user votes from diverse domains and observe its strong coding & reasoning capability over others. Hats off to @OpenAI for this incredible launch! To offer…
做了一点微小的贡献🀄️
Our new GPT-4 Turbo is now available to paid ChatGPT users. We’ve improved capabilities in writing, math, logical reasoning, and coding. Source: github.com/openai/simple-…
magnet:?xt=urn:btih:9238b09245d0d8cd915be09927769d5f7584c1c9&dn=mixtral-8x22b&tr=udp%3A%2F%2Fopen.demonii.com%3A1337%2Fannounce&tr=http%3A%2F%https://t.co/OdtBUsbeV5%3A1337%2Fannounce
Have you ever wanted to train LLMs in pure C without 245MB of PyTorch and 107MB of cPython? No? Well now you can! With llm.c: github.com/karpathy/llm.c To start, implements GPT-2 training on CPU/fp32 in only ~1,000 lines of clean code. It compiles and runs instantly, and exactly…
(1/2) 📢 Introducing LL3M: Large Language, Multimodal, and Moe Model Open Research Plan 👉github.com/jiasenlu/LL3M With the following goals: - Build an open-sourced codebase in Jax / Flax that supports large-scale training in LLM, LMM, and MoE models. - Record and share the…
One year ago was Vicuna's birthday🎂! We were so excited and built a demo for it at chat .lmsys .org. We never imagined it could get this far. Millions of people downloaded our models, visited our demo, and played with our fine-tuning recipe in FastChat project. We then…
Introducing Vicuna, an open-source chatbot impressing GPT-4! 🚀 Vicuna reaches 90%* quality of ChatGPT/Bard while significantly outperforming other baselines, according to GPT-4's assessment. Blog: vicuna.lmsys.org Demo: chat.lmsys.org
What a year it was!
One year ago was Vicuna's birthday🎂! We were so excited and built a demo for it at chat .lmsys .org. We never imagined it could get this far. Millions of people downloaded our models, visited our demo, and played with our fine-tuning recipe in FastChat project. We then…
I’m not done with MegaBlocks 😁 @apaszke @epiqueras1 @sharadvikram and I just dropped something we’ve been working on for a bit yesterday. MegaBlocks + JAX + TPU = MegaBlox 🔥 github.com/google/jax/pul…