Ali Hatamizadeh @ahatamiz1
Senior Research Scientist @NVIDIA PhD @UCLA Views are my own. Joined June 2015-
Tweets402
-
Followers877
-
Following291
-
Likes1K
Should high school students start getting involved in AI paper submissions instead of learning the fundamentals ? And wouldn't this possibly lead to an early burn-out considering how competitive things can be ?
Should high school students start getting involved in AI paper submissions instead of learning the fundamentals ? And wouldn't this possibly lead to an early burn-out considering how competitive things can be ?
Very informative manuscript on analyzing e interpretability tools for RNN-based models. As expected, there's still a gap between RNNs and Transformers.
Very informative manuscript on analyzing e interpretability tools for RNN-based models. As expected, there's still a gap between RNNs and Transformers.
If data is all you need, then attention is not all you need? Not really ! the fact that Mamba, Griffin and linear Transformers are starting to take over is not just due to the data quality improvements, but rather the amazing contributions of the community to addressing the…
If data is all you need, then attention is not all you need? Not really ! the fact that Mamba, Griffin and linear Transformers are starting to take over is not just due to the data quality improvements, but rather the amazing contributions of the community to addressing the…
Just announced! NVIDIA GB200 NVL72 exascale computer enables real-time inference and training in a single rack for intense AI and HPC workloads. Read more about it in our tech blog. bit.ly/3Vscu6N
Unfortunately, the results from this so called "needle-in-the-haystack" test does not mean that AI has become "conscious" ! This could be very well due to the training dataset of Claude 3 or other things that allow the model to detect out-of-context examples. I wish there was a…
Unfortunately, the results from this so called "needle-in-the-haystack" test does not mean that AI has become "conscious" ! This could be very well due to the training dataset of Claude 3 or other things that allow the model to detect out-of-context examples. I wish there was a…
Claude 3 has overtaken the LLM benchmarks ! In particular, Math and reasoning capabilities are unparalleled ! But how does its context length compare to Gemini's 10M ? yet to be seen.
Claude 3 has overtaken the LLM benchmarks ! In particular, Math and reasoning capabilities are unparalleled ! But how does its context length compare to Gemini's 10M ? yet to be seen.
This paper is what I call "actual high-impact work". No catchy title, but amazing discoveries !
The Nemotron-4 15B achieves SOTA performance on multiple benchmarks while having a smaller model size than the competition ! Congrats @shrimai_ and the team !
The Nemotron-4 15B achieves SOTA performance on multiple benchmarks while having a smaller model size than the competition ! Congrats @shrimai_ and the team !
Exciting times ahead !
Thank you @_akhaliq for featuring our work ! Sora is powered by Diffusion Transformer (DiT) model. We also propose DiffiT which is a novel diffusion Transformer model for image generation in both latent and pixel space. The latent DiffiT model achieves a new SOTA FID score…
Thank you @_akhaliq for featuring our work ! Sora is powered by Diffusion Transformer (DiT) model. We also propose DiffiT which is a novel diffusion Transformer model for image generation in both latent and pixel space. The latent DiffiT model achieves a new SOTA FID score…
Now that Sora has taken the world by a storm, and rightfully so, let's also look into V-JEPA by @ylecun and others. V-JEPA learns a world model based on all plausible representations, given only raw video. This is useful not just for video generation but also for autonomous…
OpenMathInstruct-1 is a game-changer !
OpenMathInstruct-1 is a game-changer !
This is indeed impressive. diffusion transformers under the hood ?
This is indeed impressive. diffusion transformers under the hood ?
Highly recommend ConvSSM. Up to x400 faster than Transformers in some benchmarks !
Highly recommend ConvSSM. Up to x400 faster than Transformers in some benchmarks !
The future of ML is bright.. we need more and more capable models like Reka Flash. One interesting thing about Reka Flash is being "trained from scratch". This would allow for full control over data flow, and potentially mitigating harmful biases from using a frozen LLM.
The future of ML is bright.. we need more and more capable models like Reka Flash. One interesting thing about Reka Flash is being "trained from scratch". This would allow for full control over data flow, and potentially mitigating harmful biases from using a frozen LLM.
This is indeed promising. Smaug-72B only uses the older version of Qwen-72B but yet able to reach the top of the hugging face leader-board with an average score of 80!
This is indeed promising. Smaug-72B only uses the older version of Qwen-72B but yet able to reach the top of the hugging face leader-board with an average score of 80!
AK @_akhaliq
308K Followers 3K Following AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo follow on Hugging Face: https://t.co/q2Qoey80GxBojan Tunguz @tunguz
186K Followers 7K Following Machine Learning ex Nvidia. Kaggle Quadruple Grandmaster. Data Scientist. Physicist. Catholic. Husband. Father. Stanford Alum. e/xgb. XGBoost.eth. AMDG.Jim Fan @DrJimFan
228K Followers 3K Following @NVIDIA Sr. Research Manager & Lead of Embodied AI (GEAR Lab). Creating foundation models for Humanoid Robots & Gaming. @Stanford Ph.D. @OpenAI's first intern.Arash Vahdat (hiring) @ArashVahdat
8K Followers 805 Following Principal scientist and research manager @nvidia research, leading forward-looking fundamental generative AI research efforts, views are my own.Sebastian Raschka @rasbt
265K Followers 901 Following Machine learning & AI researcher writing at https://t.co/A0tXWzG1p5. LLM research engineer @LightningAI. Previously stats professor at UW-Madison.Ahmad Beirami @abeirami
4K Followers 2K Following Building safe, helpful, and scalable generative AI @Google | ex-{@AIatMeta, @EA, @MIT, @Harvard, @DukeU} | @GeorgiaTech PhD | زن زندگی آزادی | opinions my ownRoss Wightman @wightmanr
18K Followers 1K Following Computer Vision @ 🤗. Ex head of Software, Firmware Engineering at a Canadian 🦄. Currently building ML, AI systems or investing in startups that do it better.Prof. Anima Anandkuma.. @AnimaAnandkumar
25K Followers 2K Following Bren Professor @caltech, Fmr Sr Director of #AI research @nvidia, Fmr Principal Scientist @awscloud, AI+Science, PDE, Neural operators. Views my own.Kyunghyun Cho @kchonyc
60K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).kache (dingboard.com) @yacineMTB
51K Followers 3K Following go to https://t.co/pWRBfY8kn2 - AI image editing IN YOUR BROWSER! follow to watch a self funded founder beat VC backed AI startups with @dingboard_Jeremy Howard @jeremyphoward
221K Followers 5K Following 🇦🇺 Co-founder: @AnswerDotAI & @FastDotAI ; Hon Professor: @UQSchoolITEE ; Digital Fellow: @StanfordChris Albon @chrisalbon
86K Followers 2K Following Director of Machine Learning at the Wikimedia Foundation. We host Wikipedia.Xiaoke Huang @xiaoke_shawn_h
20 Followers 795 Following MEng. @Tsinghua_Uni. Prev intern @MSFTResearch I am looking for PhD programs in 2024So_fia8 @SFia833895
0 Followers 420 Following Nice to meet you. My hobbies are reading, food and sports. I like cats😘 I like to meet new friends while traveling🎉🎉🎉Zhibin Gou @zebgou
119 Followers 436 Following A second-year M.S. student at Tsinghua University; Research Intern @MSFTResearch. Recent works: CRITIC, ToRA, Rho-1灶桀 @zaojie12339744
9 Followers 134 FollowingYichen Sheng @Coding_Black
174 Followers 1K Following Incoming research scientist in NVIDIA Research. Working in graphics and vision. Opinions are my own.Vineeth Kada @vineeth_kada
13 Followers 388 Following ML @ CMU | CS @ IIT Madras | Interested in MLSys and Theory🔻M🔻 @___mbine
1 Followers 251 FollowingModels Matrices @MatricesLayers
181 Followers 2K FollowingMohammed (top %0.1 in.. @vivaitti
997 Followers 664 Following flossing in my voicemail https://t.co/MLY9nrBXNfAzerbaïdjan @Lira0032
7 Followers 39 FollowingChristian S. Perone @tarantulae
6K Followers 1K Following Machine Learning, Computer Science, Math. Computer Science (UPF Brazil) 🇧🇷🧉 / Machine Learning (@polymtl/@UMontreal) 🇨🇦. Working with Autonomous Vehicles.Timothy L.J. Stewart @tljstewart
423 Followers 962 Following Engineer@Apple 👨🏼💻 | Seeker of Esoterics | Ai | Writer | Gourmet PB 🥪 MakerAndrew white @Andreww95636515
129 Followers 3K Following 3d modeling. Gaussian splatting, NeRF, Diffusion models, GANs.Shawn Charles🎤🔥 @ShawnBasquiat
32K Followers 4K Following 🧑🏾💻Ex-FAANG Software Engineer 🥑Senior ML Developer Advocate @ Coming Soon 🏗️Building Tech CommunitiesAnon Prem @koroesuu
2 Followers 61 FollowingHassan Hayat 🔥 @TheSeaMouse
5K Followers 4K Following Building the AI assistant for all @ https://t.co/D4gDyw97guEthan @Ethan_smith_20
3K Followers 676 Following a boy and his gpu vs the world. directing research at @leonardoai_. learning as I go. uf psych. generative models and representation learningycao @ycao01
90 Followers 632 Following "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness." Charles Dickens, A Tale of Two CitiesMichael J.J. Tiffany @kubla
2K Followers 2K Following Hacker: https://t.co/KdcpTomFWR Cofounder: https://t.co/oL9kikzNB2 (infosec unicorn+centaur) Cofounder: https://t.co/b4xRTQUfKY (personal data sovereignty) I have magnificent friendsIsabella @Isabell95536423
1 Followers 285 Followingsigurd_dm @sigurd_dm
58 Followers 402 FollowingArunkumar Kannan @ArunkumarKann17
69 Followers 694 Following ECE PhD student @JohnsHopkins| MASc BME @UBC | Graph Neural Networks | Manifold learning | Neuroscience | He/Him/HisErika 🍭 @ErikaH7866
2 Followers 194 Following Unaрologеtiсаllу pursuing рlеasure and seеking a willing рartnеrYash Rajput @Yash_Rajput_27
6 Followers 105 Following ML/AI Developer Passionate about transforming data into actionable insights. Always learning, forever curious. #AI #MLMAYUR INGOLE @MAYURINGOLE16
27 Followers 307 Following Machine Learning Analyst @visualdub_ai #MachineLearning #DeepLearning #Ai #NLPEric Auld @AuldEric
288 Followers 633 Following AI, math, CS. Former @uclamath. I’ll let you be in my dream if I can be in yoursASHIS J KALATHIL @kalathil_ashis
82 Followers 210 FollowingSudaraka Jayathilaka @Sudaraka94
131 Followers 538 Following Software Engineer at @Zendesk | ML Noobraytronix.ai @raytronixAI
2K Followers 5K Following The official account of https://t.co/6gp7wuHTTE. We specialize in innovative AI solutions for the 21st century. #AI #ML #AIagent #AItools #FutureOfComputingSharwan Bagaria @sharwan_bagaria
46 Followers 142 FollowingNabeel Hussain Syed @syednabeel3133
45 Followers 192 Following Just a geek who’s into science, technology, book reading, artificial intelligence, fitness, callisthenics & music.Ngozi H. Stanley-Obi @ngozistanleyobi
299 Followers 898 Following Author of “The Blessings Of Job Loss.” Business Advisory, Management Consultancy. https://t.co/U3Fwq0FNIcJ.Roelens @jlcroelens
21 Followers 117 FollowingQ @qtnx_
4K Followers 326 Following working on machine learning; @icosameron if i know you https://t.co/s6P1sWxgYAShashank_sg @Shashanksg12
0 Followers 110 FollowingNick @NickThoughtRepo
461 Followers 440 FollowingOchuba @Mr_Ochuba
186 Followers 1K Following Python | Data Science👨💻 | Tech Enthusiast | Software Development👨💻 | Technical Writer✍🏼Ashutosh Shukla @Ashutos04988831
4 Followers 134 FollowingShengyu Huang @ShengyHuang
855 Followers 925 Following PhD candidate @ETH_en. Student Researcher @GoogleAI. ex- Research Scientist intern @NVIDIAAI. https://t.co/7bZwSk75lFAK @_akhaliq
308K Followers 3K Following AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo follow on Hugging Face: https://t.co/q2Qoey80GxYann LeCun @ylecun
708K Followers 716 Following Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.Bojan Tunguz @tunguz
186K Followers 7K Following Machine Learning ex Nvidia. Kaggle Quadruple Grandmaster. Data Scientist. Physicist. Catholic. Husband. Father. Stanford Alum. e/xgb. XGBoost.eth. AMDG.Andrej Karpathy @karpathy
974K Followers 904 Following 🧑🍳. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets 🧠🤖💥Jim Fan @DrJimFan
228K Followers 3K Following @NVIDIA Sr. Research Manager & Lead of Embodied AI (GEAR Lab). Creating foundation models for Humanoid Robots & Gaming. @Stanford Ph.D. @OpenAI's first intern.Kosta Derpanis @CSProfKGD
47K Followers 198 Following #CS Associate Prof @YorkUniversity, #ComputerVision Scientist Samsung #AI, @VectorInst Faculty Affiliate, TPAMI AE, #CVPR2024/#ECCV2024 Publicity Co-chairAndrew Ng @AndrewYNg
1.0M Followers 908 Following Co-Founder of Coursera; Stanford CS adjunct faculty. Former head of Baidu AI Group/Google Brain. #ai #machinelearning, #deeplearning #MOOCsArash Vahdat (hiring) @ArashVahdat
8K Followers 805 Following Principal scientist and research manager @nvidia research, leading forward-looking fundamental generative AI research efforts, views are my own.Jia-Bin Huang @jbhuang0604
51K Followers 285 Following Associate Professor @umdcs; Part-time Research Scientist @Meta. I like pixels.Sebastian Raschka @rasbt
265K Followers 901 Following Machine learning & AI researcher writing at https://t.co/A0tXWzG1p5. LLM research engineer @LightningAI. Previously stats professor at UW-Madison.NeurIPS Conference @NeurIPSConf
111K Followers 35 Following New Orleans, Dec 10-16, 23. https://t.co/ga8aOw615g Tweets to this account are not monitored. Please send feedback to [email protected].Gary Marcus @GaryMarcus
144K Followers 7K Following “A beacon of clarity”. Spoke at US Senate AI Oversight committee. Founder/CEO Geometric Intelligence (acq. by Uber). Rebooting AI & Taming Silicon Valley.Jürgen Schmidhuber @SchmidhuberAI
106K Followers 0 Following Invented principles of meta-learning (1987), GANs (1990), Transformers (1991), very deep learning (1991), etc. Our AI is used many billions of times every day.Ahmad Beirami @abeirami
4K Followers 2K Following Building safe, helpful, and scalable generative AI @Google | ex-{@AIatMeta, @EA, @MIT, @Harvard, @DukeU} | @GeorgiaTech PhD | زن زندگی آزادی | opinions my ownTanishq Mathew Abraha.. @iScienceLuvr
54K Followers 1K Following PhD at 19 | Founder and CEO at @MedARC_AI | Research Director at @StabilityAI | @kaggle Notebooks GM | Biomed. engineer @ 14 | TEDx talk➡https://t.co/xPxwKTq6QbRoss Wightman @wightmanr
18K Followers 1K Following Computer Vision @ 🤗. Ex head of Software, Firmware Engineering at a Canadian 🦄. Currently building ML, AI systems or investing in startups that do it better.Prof. Anima Anandkuma.. @AnimaAnandkumar
25K Followers 2K Following Bren Professor @caltech, Fmr Sr Director of #AI research @nvidia, Fmr Principal Scientist @awscloud, AI+Science, PDE, Neural operators. Views my own.Prakash Sangam @MyTechMusings
49K Followers 7K Following Ind. Analyst, USA Today, Forbes, RCR, Fierce & EET contributor, 3GPP/ETSI member, #TantrasMantra host, Cover 5G/AI/IoT/Wi-Fi/Cloud Ex @Qualcomm @Ericsson @ATTderek guy @dieworkwear
796K Followers 962 Following Menswear writer. Editor at @putthison. Creator of @RLGoesHard. Bylines at The New York Times, The Washington Post, The Financial Times, Esquire, and Mr. PorterJohan Ferret @johanferret
801 Followers 869 Following Research Scientist at Google DeepMind. PhD from @InriaScool. All things Reinforcement Learning. Into generative art, roguelikes & music.Zhibin Gou @zebgou
119 Followers 436 Following A second-year M.S. student at Tsinghua University; Research Intern @MSFTResearch. Recent works: CRITIC, ToRA, Rho-1Yichen Sheng @Coding_Black
174 Followers 1K Following Incoming research scientist in NVIDIA Research. Working in graphics and vision. Opinions are my own.Dhruv Batra @DhruvBatraDB
14K Followers 321 Following Senior Director (FAIR @MetaAI). Professor (@GeorgiaTech). Co-founded CaliperAI. Researcher in AI. @CarnegieMellon alum.Hassan Hayat 🔥 @TheSeaMouse
5K Followers 4K Following Building the AI assistant for all @ https://t.co/D4gDyw97guEthan @Ethan_smith_20
3K Followers 676 Following a boy and his gpu vs the world. directing research at @leonardoai_. learning as I go. uf psych. generative models and representation learningSambaNova Systems @SambaNovaAI
3K Followers 703 Following We bring #AI innovations developed in advanced research to organizations around the world. Sign up for updates to stay ahead of AI: https://t.co/bGeeh5JSt0Aaron Defazio @aaron_defazio
6K Followers 355 Following Research Scientist at Meta working on optimization. Fundamental AI Research (FAIR) teamAI Breakfast @AiBreakfast
167K Followers 209 Following The latest rumors and developments in the world of artificial intelligence. DM to include your AI project in the newsletter.Q @qtnx_
4K Followers 326 Following working on machine learning; @icosameron if i know you https://t.co/s6P1sWxgYASoham De @sohamde_
2K Followers 971 Following Research Scientist at DeepMind. Previously PhD at the University of Maryland.Joshua Elkington @elkingtonxy
25K Followers 988 Following Founder and General Partner at Axial @axialxyzdingboard @dingboard_
7K Followers 6 Following https://t.co/rJlkkSWwzD an browser AI powered image editor that literally just works for support dm @yacineMTB MERCH https://t.co/XXTEbnPVrKDaniel Han @danielhanchen
7K Followers 924 Following Building @UnslothAI. Finetune LLMs 30x faster https://t.co/aRyAAgKOR7. Prev ML at NVIDIA. Hyperlearn used by NASA. I like maths, making code go fastAmy Lu @amyxlu
2K Followers 1K Following CS PhD student @berkeley_ai, AI for drug discovery @PrescientDesign. Prev: @GoogleAI @insitro @UofT 🇨🇦Yang You @YangYou1991
7K Followers 380 Following Presidential Young Professor at @NUSingapore. @Forbes 30 under 30. Ph.D. from @UCBerkeley. Founder, President and Chairman of @HPCAITech and Colossal-AI.Samuel L Smith @SamuelMLSmith
2K Followers 361 Following Research Scientist at DeepMind. Optimization and Initialization. Formerly Google Brain. Ex-Physicist.Shrimai @shrimai_
2K Followers 502 Following Senior Research Scientist @nvidia | PhD from @SCSatCMU | Prev @SFResearch @facebookai & @MSFTResearchYuandong Tian @tydsh
16K Followers 795 Following Research Scientist and Senior Manager in Meta AI (FAIR). AI-guided Optimization and Representation Learning. Novelist in spare time. PhD in @CMU_Robotics.Pierre Stock @PierreStock
2K Followers 270 Following Research Scientist @mistralai. Working on Large Language Models. Ex RS @metaAI | PhD @ENSdeLyon | MVA @ENS_ParisSaclayMosh Levy @mosh_levy
264 Followers 159 Following phd student @biunlp. studying ai robustness and behaviors.Wing Lian (caseus) @winglian
8K Followers 2K Following @axolotl_ai dev. OpenAccess AI Collective founder. Alignment Labs. AI/ML tinkerer. Building tools for everyone.Guillaume Lample @GuillaumeLample
37K Followers 648 Following Cofounder & Chief Scientist https://t.co/hLfvKLkFHd (@MistralAI). Working on LLMs. Ex @MetaAI | PhD @Sorbonne_Univ_ | MSc @CarnegieMellon | X11 @PolytechniqueNayan Saxena @SaxenaNayan
2K Followers 2K Following Brought artificial intelligence to @RBC, @Glowforge, @Wombo, @Bell & beyondJupyter Meowbooks @untitled01ipynb
14K Followers 306 Following Managing Director, Memetics and Advanced Shitposting Institute (hyperstitonal) || may post manipulated imagery and bad memes. Husband. Father.Yixin Wan @yixin_wan_
1K Followers 847 Following PhD student @UCLAComSci | Trustworthy Generative Models | Previously @AmazonScience, @MSFTResearch AsiaAllan Zhou @AllanZhou17
1K Followers 439 Following Final-year AI PhD student @Stanford. NN architecture design, learned optimizers, and hparam optimization.Eric Hartford @erhartford
11K Followers 394 Following Principal Applied AI Researcher @TensorWaveCloud I make AI models Dolphin and Samantha https://t.co/3ri2GbXrQB BTC 3ENBV6zdwyqieAXzZP2i3EjeZtVwEmAuo4Jonah Turner @drexalt
319 Followers 916 Following grinding ml, current master's student 🇫🇷 e/acc - gpu kernels - computer visionPieter Abbeel @pabbeel
78K Followers 434 Following Diffusion Models; Large World Model; UniSim; TRPO; SAC; Ring Attention; MAML; HER; Domain Randomization; Decision Transformer; LLM as Zero-Shot Planners; RFM-1Naveen Rao @NaveenGRao
28K Followers 782 Following VP GenAI @Databricks. Former CEO/cofounder MosaicML & Nervana/IntelAI. Neuro + CS. I like to build stuff that will eventually learn how to build other stuff.Reka @RekaAILabs
10K Followers 13 Following An AI research and product company 🌟. We are a team of scientists and engineers building state-of-the-art multimodal language models 🪄.Mahmoud Mabrouk @mmabrouk_
423 Followers 2K Following Co-founder at https://t.co/RxOHpyrRm6 The open-source developer LLM developer platform. Manage prompts, evaluate, and deploy LLM apps with confidence. Prev: PhD @rbolabPiotr Padlewski @PiotrPadlewski
1K Followers 319 Following Chief Meme Officer @ https://t.co/CtBrcKmliI, ex-Google Deepmind/Brain ZurichAlbert Jiang @AlbertQJiang
2K Followers 406 Following AI4Maths @Cambridge_CL Science @MistralAI I bake my own opinions at temperature=2.0Starting early here in california. Though she calls neurons eyes and inputs bubbles.
It's Saturday. My 3yo is napping right now. Once he wakes up, I'll go fire up some H100's and help him code some of the easy ideas I have in the back of my mind. We might do it just in time for NeurIPS. Gotta start early and completely abuse my privilege, or so I heard🚀🚀🚀
@ahatamiz1 Yes that makes sense. Another point is that Olympiad doesn't require GPUs but NeurIPS paper does....
I had a blast working on alignment for RecurrentGemma! It is exciting to go beyond the quadratic attention of Transformers for a while! Congrats to everyone involved ✨
Google presents RecurrentGemma Moving Past Transformers for Efficient Open Language Models We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent
NeurIPS introduces a track dedicated to advancing kids of rich parents even more than they already are
I really like HGRN2's concept of integrating forget gates with keys in (gated) linear attention. It's a neat and effective approach! We've incorporated HGRN2 into the Flash-Linear-Attention library. Check it out here: github.com/sustcsonglin/f….
Love these state expansion ablations in HGRN2, the outer product with a tied forget gate is super elegant and should work well on a TPU. Amazing job!
🔥 Not All Tokens Are What You Need! 🚀 Releasing the Rho-1 series, including the first 1B LLM to hit 40.6% on MATH. Rho-1 introduces Selective Language Modeling (#SLM) for token-level pretraining data selection. Thanks to @_akhaliq and @arankomatsuzaki for sharing our work!
This new 'explore' tab on twitter is actually pretty good. First change I've found genuinely useful since the new ownership. (I think you only get it if you have Grok.) Lots of errors still, but it's accurate and targeted enough that I'm definitely getting value from it.
I have been working on vision+language models (VLMs) for a decade. And every few years, this community re-discovers the same lesson -- that on difficult tasks, VLMs regress to being nearly blind! Visual content provides minor improvement to a VLM over an LLM, even when these…
Today we’re releasing OpenEQA — the Open-Vocabulary Embodied Question Answering Benchmark. It measures an AI agent’s understanding of physical environments by probing it with open vocabulary questions like “Where did I leave my badge?” More details ➡️ go.fb.me/7vq6hm…
Oh no -- I signed up to X Premium since it's required for them to pay me... ...but now they making the cringemark non-optional :( Not sure if it's worth it.
[#CVPR2024] I'm thrilled to share our @CVPR 24 paper "Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation" [Paper] arxiv.org/abs/2404.04231 [Code] github.com/072jiajia/imag… #CVPR #CVPR24 #MachineLearning #ArtificialIntelligence #AI #segmentation #Researchconf
RNN language models are making a comeback recently, with new architectures like Mamba and RWKV. But do interpretability tools designed for transformers transfer to the new RNNs? We tested 3 popular interp methods, and find the answer is mostly “yes”! arxiv.org/abs/2404.05971
@ahatamiz1 @giffmana @arimorcos Leaving this here: arxiv.org/html/2402.0103…
Architectural changes enable significantly higher throughput for a RecurrentGemma-variant of the Gemma models.
Announcing RecurrentGemma! github.com/google-deepmin… - A 2B model with open weights based on Griffin - Replaces transformer with mix of gated linear recurrences and local attention - Competitive with Gemma-2B on downstream evals - Higher throughput when sampling long sequences
@ahatamiz1 @giffmana @arimorcos Not that data doesn't. Very much does and is a key part of the difference in LLMs. But it is clear that different architectures have different biases and limitations. We should be clearly exploring these to improve. There's more than just throwing money at a problem.
This is probably the best announcement post of the day!
Happy to share - blah blah blah. Gemma + Griffin = RecurrentGemma Competitive quality with Gemma-2B and much better throughput, especially for long sequences. Cracked model from cracked team! Check it out below 👇
@ahatamiz1 @giffmana @arimorcos the amount of time/experimentation/etc spent scaling transformers makes its difficult to make an apples-to-apples comparison. if the community spent years obsessing over MLP-Mixer optimization (trying different aspect ratios, etc), the ICL/retreival performance gap might narrow
@ahatamiz1 @giffmana @arimorcos Transformers are indeed marginally better for NLP tasks, it just isn't the many OOM of perf improvement that can be more readily gained by scaling high quality data + compute. MSFT's Phi-2 uses literally 1000x less data for equivalent perf, w the same architecture as other LLMs
@ahatamiz1 @giffmana @arimorcos this Meta paper showing GPT-3 level perf with a MLP based architecture didn't get enough discussion imo, arxiv.org/abs/2203.06850