Mostafa Dehghani @m__dehghani
/google_deepmind/gemini/. San Francisco Joined June 2014-
Tweets2K
-
Followers7K
-
Following566
-
Likes11K
Apparently, I'm an influencer now... for reckless neural network training!😬 Just got my first tweet citation: publications.reka.ai/reka-core-tech… btw, this report is a must-read! x.com/m__dehghani/st…
Apparently, I'm an influencer now... for reckless neural network training!😬 Just got my first tweet citation: publications.reka.ai/reka-core-tech… btw, this report is a must-read! x.com/m__dehghani/st…
@m__dehghani this has a section on UT you may be interested in
@TheSeaMouse @GoogleDeepMind I personally closely follow the dynamic computation literature. My preferred and simplest openreview.net/pdf?id=qW_GZYy…. @m__dehghani in the kitchen. He seems to like this field too
Excited to share our paper "Low-Rank Adaptation for Multilingual Summarization: An Empirical Study" from @GoogleDeepMind internship got accepted at #NAACL2024🎉 Huge thanks to my co-authors @fantinehuot @jasmijnbastings @m__dehghani @kitsing_l & Mirella! arxiv.org/abs/2311.08572
Very happy that our paper "The Impact of Group Membership Bias on the Quality and Fairness of Exposure in Ranking" w/ @mdr @841io @m__dehghani got accepted at #SIGIR2024 We analyze what happens when sensitive attributes, e.g. gender, impact a user’s judgment.
Training a simple classifier on top of "frozen features" from large vision models is now common and powerful. In our CVPR 2024 paper (arxiv.org/abs/2403.10519), we show that just applying simple augmentations on such frozen features can improve few-shot classification.
Impressive results from the combination of modular training and MoEs! Also nice to see Sparse Upcycling is a pretty decent baseline. @arankomatsuzaki @joapuipe, ....
Impressive results from the combination of modular training and MoEs! Also nice to see Sparse Upcycling is a pretty decent baseline. @arankomatsuzaki @joapuipe, .... https://t.co/2CTS4ixi4z
Awesome tutorial by @phillip_lippe to master training neural networks at scale using JAX with `shard_map`. There is a lot to learn here! x.com/phillip_lippe/…
Awesome tutorial by @phillip_lippe to master training neural networks at scale using JAX with `shard_map`. There is a lot to learn here! x.com/phillip_lippe/…
10/11 I learned a lot about parallelized training during my @GoogleDeepMind internship thanks to @m__dehghani @emiel_hoogeboom @JonathanHeek . Google's infrastructure makes scaling on TPUs so easy.
California regulators just approved a huge Waymo expansion: all the way down the SF penninsula and a big chunk of the LA metro area. cpuc.ca.gov/-/media/cpuc-w…
*Adaptive Computation with Elastic Input Sequence* by @XueFz @ValeriiLikhosh1 @anuragarnab @YangYou1991 @neilhoulsby @m__dehghani A method to adapt the inference time by appending a variable number of tokens (taken from an external memory) to the input . arxiv.org/abs/2301.13195
Google is back baby! Taking the first spot for open models on the @huggingface LLM leaderboard for its sizes (2B & 7B): huggingface.co/collections/go…
Introducing Gemma: a family of lightweight, state-of-the-art open models for developers and researchers to build with AI. 🌐 We’re also releasing tools to support innovation and collaboration - as well as to guide responsible use. Get started now. → dpmd.ai/3UJu1Y1
Introducing Gemma - a family of lightweight, state-of-the-art open models for their class, built from the same research & technology used to create the Gemini models. Blog post: blog.google/technology/dev… Tech report: goo.gle/GemmaReport This thread explores some of the…
An optimized release with TensorRT-LLM gives users the ability to develop with LLMs using only a desktop with an NVIDIA RTX GPU. Check out more info in our developer blog: developer.nvidia.com/blog/nvidia-te… 3/4
As one example of 1.5 Pro’s sophisticated multimodal understanding and reasoning capabilities with long context, when given a 44-minute silent film, the model can analyze various plot points and events, and even makes sense of small details you might have missed.
To show what’s possible with the drastically huge context window in Gemini 1.5 Pro, we prompted it with the three.js examples code - over 100,000 lines of code/800k+ tokens! (That’s not even the max, it can handle millions of tokens 😀) Gemini was able to process all the code…
We are announcing the Gemini 1.5 series of models today! * Support for 1M context lengths (tested up to 10M) * Gemini 1.5 Pro nears Gemini 1.0 Ultra performance with greater efficiency * Cloud users can sign up to waitlist for preview blog.google/technology/ai/…
Lucas Beyer (bl16) @giffmana
56K Followers 442 Following Researcher (Google DeepMind/Brain in Zürich, ex-RWTH Aachen), Gamer, Hacker, Belgian. Mostly gave up trying mastodon as [email protected]Yi Tay @YiTayML
29K Followers 97 Following chief scientist / cofounder @RekaAILabs 🫠 past: research scientist @google brain 🤯 currently learning to be a dad 🍼Peyman Milanfar @docmilanfar
67K Followers 261 Following Distinguished Scientist at Google Research. Computational Imaging, Machine Learning, and Vision. Tweets = personal opinions. May change or disappear over time.Behnam Neyshabur @bneyshabur
18K Followers 689 Following Senior Staff Research Scientist @GoogleDeepMind, Interested in reasoning w. LLMs, traveling & backpackingRosanne Liu @savvyRL
33K Followers 965 Following Cofounded & running @ml_collective. Host of Deep Learning Classics & Trends. Research at Google DeepMind. DEI/DIA Chair of ICLR & NeurIPS. Writing https://t.co/IbycyGfnDRDelip Rao e/σ @deliprao
46K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈Eric Jang @ericjang11
69K Followers 3K Following physical AGI at 1X. Author of "AI is Good for You" https://t.co/eFg4WXhg0prohan anil @_arohan_
12K Followers 2K Following Principal Engineer, @GoogleDeepMind Gemini. prev PaLM-2. Tinkering with optimization and distributed systems. opinions are my own.Fernando Diaz @841io
5K Followers 1K Following Associate Professor, CMU. Researcher, Google. Evaluation and design of information retrieval and recommendation systems, including their societal impacts.Dan Roy @roydanroy
45K Followers 2K Following ML / AI researcher, emphasis on theory. Research Director and Canada CIFAR AI Chair, @VectorInst Professor, @UofT (Statistics/CS)Sara Hooker @sarahookr
39K Followers 7K Following I lead @CohereForAI. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, @trustworthy_ml. Changing spaces where breakthroughs happen.Sander Dieleman @sedielem
50K Followers 2K Following Research Scientist at Google DeepMind. I tweet about deep learning (research + software), music, generative models (personal account).Jeff Dean (@🏡) @JeffDean
296K Followers 6K Following Chief Scientist, Google DeepMind and Google Research. Co-designer/implementor of things like @TensorFlow, MapReduce, Bigtable, Spanner, Gemini .. (he/him)Sindy Löwe @sindy_loewe
3K Followers 361 Following PhD Student with @WellingMax at the University of Amsterdam. Deep Learning with Structured Representations.Thomas Wolf @Thom_Wolf
68K Followers 4K Following Co-founder and CSO @HuggingFace - open-source and open-scienceJo Kristian Bergum @jobergum
9K Followers 811 Following Distinguished Engineer @vespaengine. Tweets about Vespa, search, recommendation, ranking, and IR. CET. #StandWithUkraine 💙💛Leo Boytsov @srchvrs
7K Followers 2K Following Sr. Research Scientist @AWS Labs (ph-D @LTIatCMU) working on unnatural language processing, speaking πtorch & C++. Opinions sampled from MY OWN 100T param LM.Miles Brundage @Miles_Brundage
43K Followers 10K Following Policy research at @openai. I mostly tweet about AI, animals, and sci-fi. He/him. Views my own.Paul Soulos @paulsoulos
290 Followers 332 Following Computational Cognitive Science @JhuCogsci researching neurosymbolic methods. Previously wearable engineering @fitbit and @Google. @[email protected]Miguel_Pedraza @CabezaDespejada
56 Followers 2K FollowingSina Sheikholeslami @sinash93
368 Followers 586 Following PhD Student @dcatkth, working on MLSys and distributed deep learning. Stockholm Local Rep @digitalumni, host @TheSinashShowアメド オシナ�.. @HakeemDemi
945 Followers 2K Following Aspiring Data professional (Scientist/ML Engineer.)|| Learn to do a lot with very little, learn to make the complex simple. || Profane & unredeemed. || 27+2wl @2wlearning
208 Followers 189 Following Documenting my progress learning ML every day. 2 more weeksantonioruedatoicen @antonioruedato1
36 Followers 530 FollowingPyone MaungMaung @fugokidi
48 Followers 1K FollowingPhillip Lindsay @EastLAPinche
59 Followers 385 Followingnvbkdw @nvbkdw
89 Followers 662 Following random tweets about tech and software engineering Building Next-gen AI inference infrastructure ex @AWS, @Apple, @Uber, etc.zzm @ZhiminZhan29420
29 Followers 94 FollowingAmirhossein Abaskohi @AmirAbaskohi
144 Followers 869 Following Master Student @UBC_CS | NLP Researcher @UBC_NLP | Content Creator @YouTube and @Medium #NLProc #MachineLearningHimbo Mathématique @lhmccabe
463 Followers 1K Following cs phd student | data scientist | likes, follows, etc. not representative of personal viewsElachqar Oussama @Oussama_e
59 Followers 2K Followinghou.mon | هومان @ihouman
304 Followers 3K Following gen : 1🇺🇸 | tv editor | producer | newbish🐍 programmer & deep learning enthusiast | Mechanic to my car | I make things, sometimes they don’t work.Yasser Benigmim @yasserbenigmim
69 Followers 772 Following PhD student at Télécom Paris, @DeepLearning, @ComputerVisionAynaz @aynazjavani
0 Followers 180 FollowingShoaib Ahmed Dipu @shoaibdipu
174 Followers 3K Following Prospective PhD Student | Lecturer at @CSEbracUHanieh Hashemi @Haanie_h
29 Followers 62 Following ML engineer @Apple. Passionate about privacy preserving ML and responsible AI. PhD from @USC. Ex research intern @AIatMeta and @Samsung SemiLorenzo @ragazzogagaga
112 Followers 2K FollowingEmmanuel Dahunsi @ImmaCloudSec
2K Followers 2K Following Tweets about 🤖 AI/ML Security | 🥊Boxing | ✈️Travelling, Tweets=Ownemanon @JianSuji
76 Followers 1K FollowingMatthieu Futeral-Pete.. @FuteralMatthieu
79 Followers 321 Following PhD student @Inria in Willow and ALMAnaCH teams. Prev. intern @GoogleDeepMind. MVA & Ensae Paris AlumniXiuquan Lv @ustcwizard
72 Followers 733 FollowingDevansh @devanshkv
167 Followers 324 Following Data Scientist | PhD from @WVUEberly | MS from @tvmiiser | He/HimJiayi_Cheng @Jiayi_Cheng1
39 Followers 923 Following Causal Machine Learning, Conformal Inference and moreknowhere @all_grad_reduce
41 Followers 916 Followingfloat trip @float__trip
18 Followers 328 FollowingCawreo @Cawreo
124 Followers 748 Following Founder & CEO of @NexusNets — e/αi https://t.co/Cso4XFbuKcNikhil Mehta @_nikhilmehta
63 Followers 323 FollowingKhan Bhebhura @Bhebhurakhan
72 Followers 688 Following Passionate nerd🔭 | AI consultant | founder and CEO of CloudyAI the tech startup democratizing access to quality healthcare in Africa using AI 💻am i a bot? @tempuser1234566
0 Followers 8 FollowingMarchSakura @MarchSakura2
18 Followers 87 FollowingMahab @mahab_8
37 Followers 365 FollowingYoungtaek Oh @ytaek_oh
0 Followers 54 FollowingVito Liang @MasterVito0601
8 Followers 100 Following Second Year Master Student at @Tsinghua_Uni Research Intern @MSFTResearch, prev: @Tencent AI lab NLP. LLM Reasoning and Reinforcement LearningZekka Rorschach @RorschachZekka
13 Followers 82 FollowingIvelina Petrova @ivelinapetrovaX
29 Followers 784 Following Industrial Management and development Master Degree and Architect in Architecture [email protected] [email protected] [email protected]彡 Nᗩᖇᒪi 彡 @BGnarli
12K Followers 2K Following ┃Hello fellow! 💜┃AI ART CONDUCTOR┃Tech-Art┃Deep within latent space.┃#Art #digitalart #AIart ┃Air ThirstyYong Sen @yongsen_teo
2 Followers 47 Following Prompting is all you need. I'm either joking or serious 50% of the time. prev: @woof_hsAmgad Hasan @AmgadGamalHasan
236 Followers 292 Following A machine learning engineer specializing in LLMs and ASR modelsDanielDong @Nasimchs
114 Followers 4K FollowingAR0575 @ar057562841
1 Followers 478 Following(((ل()(ل() 'yoav))).. @yoavgo
46K Followers 2K FollowingLucas Beyer (bl16) @giffmana
56K Followers 442 Following Researcher (Google DeepMind/Brain in Zürich, ex-RWTH Aachen), Gamer, Hacker, Belgian. Mostly gave up trying mastodon as [email protected]Yi Tay @YiTayML
29K Followers 97 Following chief scientist / cofounder @RekaAILabs 🫠 past: research scientist @google brain 🤯 currently learning to be a dad 🍼Peyman Milanfar @docmilanfar
67K Followers 261 Following Distinguished Scientist at Google Research. Computational Imaging, Machine Learning, and Vision. Tweets = personal opinions. May change or disappear over time.Behnam Neyshabur @bneyshabur
18K Followers 689 Following Senior Staff Research Scientist @GoogleDeepMind, Interested in reasoning w. LLMs, traveling & backpackingRosanne Liu @savvyRL
33K Followers 965 Following Cofounded & running @ml_collective. Host of Deep Learning Classics & Trends. Research at Google DeepMind. DEI/DIA Chair of ICLR & NeurIPS. Writing https://t.co/IbycyGfnDREric Jang @ericjang11
69K Followers 3K Following physical AGI at 1X. Author of "AI is Good for You" https://t.co/eFg4WXhg0pKyunghyun Cho @kchonyc
61K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).rohan anil @_arohan_
12K Followers 2K Following Principal Engineer, @GoogleDeepMind Gemini. prev PaLM-2. Tinkering with optimization and distributed systems. opinions are my own.Fernando Diaz @841io
5K Followers 1K Following Associate Professor, CMU. Researcher, Google. Evaluation and design of information retrieval and recommendation systems, including their societal impacts.Kevin Patrick Murphy @sirbayes
42K Followers 333 Following Research Scientist at Google Brain / Deepmind. Interested in Bayesian Machine Learning.Dan Roy @roydanroy
45K Followers 2K Following ML / AI researcher, emphasis on theory. Research Director and Canada CIFAR AI Chair, @VectorInst Professor, @UofT (Statistics/CS)Sara Hooker @sarahookr
39K Followers 7K Following I lead @CohereForAI. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, @trustworthy_ml. Changing spaces where breakthroughs happen.Felix Hill @FelixHill84
9K Followers 777 Following Research Scientist, Deepmind I try to think hard about everything I tweet, esp on 90s football and 80s music None of my opinions are really someone else'sJeff Dean (@🏡) @JeffDean
296K Followers 6K Following Chief Scientist, Google DeepMind and Google Research. Co-designer/implementor of things like @TensorFlow, MapReduce, Bigtable, Spanner, Gemini .. (he/him)Sindy Löwe @sindy_loewe
3K Followers 361 Following PhD Student with @WellingMax at the University of Amsterdam. Deep Learning with Structured Representations.Horace He @cHHillee
23K Followers 448 Following Working at the intersection of ML and Systems @ PyTorch "My learning style is Horace twitter threads" - @typedfemaleTamay Besiroglu @tamaybes
3K Followers 719 Following Thinking about economics, computing and machine learning @EpochAIResearch @MIT_CSAILReiner Pope @reinerpope
2K Followers 384 Following CEO and founder, @MatXComputing, developing high throughput chips tailored for LLMsMarcus Lewis @mrcslws
697 Followers 507 Following AI, programming, neuroscience. Previously @UMichCSE, @Microsoft, @NumentaDwarkesh Patel @dwarkesh_sp
54K Followers 698 Following Being pretrained Host of Dwarkesh Podcast https://t.co/3SXlu7fy6N https://t.co/rEhnfYywXY https://t.co/hQfIWdM1UnKaushik Shivakumar @19kaushiks
155 Followers 193 Following @GoogleDeepMind prev. BS and MS from @berkeley_eecsDaniel Han @danielhanchen
7K Followers 932 Following Building @UnslothAI. Finetune LLMs 30x faster https://t.co/aRyAAgKOR7. Prev ML at NVIDIA. Hyperlearn used by NASA. I like maths, making code go fasttrieu @thtrieu_
2K Followers 241 Following thinking about thinking. created alphageometry, darkflow. prev: nyu, google brain/deepmindGuillaume Lample @GuillaumeLample
37K Followers 647 Following Cofounder & Chief Scientist https://t.co/hLfvKLkFHd (@MistralAI). Working on LLMs. Ex @MetaAI | PhD @Sorbonne_Univ_ | MSc @CarnegieMellon | X11 @PolytechniqueRootclaim @Rootclaim
5K Followers 169 Following Settle controversies using real-world data and math. Calculating reality the way no human can.Charlie Chen @CharlieXYChen
134 Followers 159 Following Making TPUs go BRRRRR- at Google @DeepMind. Dabbling in Maths, Code, AI, Robotics and Art. Views are my own. Spec Sampling & Scaling Gemini 1.5 Pro InferenceKeran R @KeranRong
55 Followers 122 Following Allseas | MIT | Google AI | Deepmind Gemini MultimodalAlexandr Wang @alexandr_wang
142K Followers 694 Following ceo at @scale_ai. rational in the fullness of timeJan Leike @janleike
44K Followers 321 Following ML Researcher, co-leading Superalignment @OpenAI. Optimizing for a post-AGI future where humanity flourishes.Sharad Vikram @sharadvikram
1K Followers 510 Following Researcher @ Google Deepmind. I work on JAX + Pallas (https://t.co/lPMsq3yzgL) and Gemini. In the past I worked on Oryx and TFP. I like learning.Jason Benn 🏡 · sc.. @jasoncbenn
4K Followers 3K Following Creating multigenerational scenius. Founded the Neighborhood. 10% of profits from home sales go to -REDACTED-. It takes a village! https://t.co/cF0WngvubSlmsys.org @lmsysorg
36K Followers 172 Following Large Model Systems Organization. We created Vicuna and Chatbot Arena! Compare 30+ LLMs (GPT-4/Claude/Llamas) side-by-side at https://t.co/IDFeIDIOtmMark Collier @MarkPKCollier
165 Followers 195 Following Research Software Engineer - Google AI, Zurich@chloe21e8张力♡ @chloe21e8
20K Followers 900 FollowingRoman Ring @Inoryy
3K Followers 735 Following [email protected] Research Engineer @DeepMind Large Scale Deep Learning Is All You Need? AlphaStar, Gopher, Flamingo, ???Enrique Piqueras @epiqueras1
2K Followers 233 Following Organizing the world's information and making it universally accessible and useful using JAX @Google @Deepmind.Essential AI @essential_ai
2K Followers 5 Following Our mission is to deepen the partnership between humans and computers, unlocking collaborative capabilities that far exceed what could be achieved today.Fangyu Liu @hardy_qr
1K Followers 1K Following Research Scientist @GoogleDeepMind building Gemini♊. PhD @CambridgeLTL . BMath @UWaterloo . From 成都🐼.Amir Efrati @amir
36K Followers 998 Following Executive Editor @theinformation. We're hiring. amir @ https://t.co/XVwTW5cS62 DM for SignalPika @pika_labs
116K Followers 53 Following Video on command. Website: https://t.co/G5bjmrMQsx Discord: https://t.co/bX68ThPTQH About: https://t.co/atvdcgbe9SChristine McLeavey @mcleavey
8K Followers 3K Following @openai audio team: voice mode, jukebox & musenetMo Bavarian @mobav0
11K Followers 915 Following Research Scientist, working on optimization and architecture of LLMs at OpenAI. Math ❤️. Prev SWE Rubrik, PhD MIT.Dave @dece
3K Followers 681 FollowingNoam Brown @polynoamial
34K Followers 610 Following Researching reasoning @OpenAI | Co-created Libratus/Pluribus, the first superhuman no-limit poker AIs | Co-created CICERO | PhD from @SCSatCMUAlex Tomala @a__tomala
1K Followers 114 Following Research Engineer @GoogleDeepMind It’s time to ship🫡Jong Wook Kim 💟 @_jongwook_kim
4K Followers 466 Following Member of Technical Staff @OpenAI, authored CLIP and Whisper; previously at @nyuMARL, @SpotifyResearch, @pandoramusic, @kakaocorpglobal, and @NCSOFTEric Chu @its_ericchu
2K Followers 793 Following Research scientist @ Google DeepMind. AI reasoning + alignment/safety to help humans. Gemini, Bard, PaLM 2. Prev PhD @ MIT.Jonas Adler @JonasAAdler
466 Followers 104 Following Research Scientist, DeepMind. AlphaFold, Gemini.シェイン・グウ @shanegJP
53K Followers 318 Following Gemini 1.5 @GoogleDeepMind 東京・SF。 元@GoogleAI Brain、元 @OpenAI。 英語: @shaneguML。全て個人意見です。Sholto Douglas @_sholtodouglas
15K Followers 855 Following Scaling Gemini @Deepmind - working towards intelligence too cheap to meterArushi @itsamks
1K Followers 866 Following clear eyes full heart can’t lose | technical staff @adeptailabs | @calhacks @ucbrise @calVaishnavh Nagarajan @_vaishnavh
2K Followers 530 Following Research scientist at Google || CS PhD at Carnegie Mellon. Interested in the theory of AI & Machine Learning. he/him 🏳️🌈some random paper some guy wrote about scaling laws vs model architectures: arxiv.org/abs/2207.10551
not true, especially for language. if you trained a large & deep MLP language model with no self-attention, no matter how much data you'll feed it you'll still be lacking behind a transformer (with much less data). will it get to the same point? i don't think so. your tokens…
The dataset is everything. Great read: nonint.com/2023/06/10/the…
I always strongly suggest people to read this work (arxiv.org/abs/2207.10551) by @YiTayML and @m__dehghani when discussing the model architecture. It almost takes up to 50% pages of the literature survey Chapter in my PhD thesis. It is so visionary to study this in 2022. I can…
not true, especially for language. if you trained a large & deep MLP language model with no self-attention, no matter how much data you'll feed it you'll still be lacking behind a transformer (with much less data). will it get to the same point? i don't think so. your tokens…
drug advertisements on the radio in the us are wiiillldddd it’s so bizarre to us euros
I'm personally super excited to share the progress on the 400B+ training. So proud of the entire team that worked tirelessly to make this model a reality. There's lots more to come including a full research paper soon!🚀🦙🦙
The Chinchilla scaling paper by Hoffmann et al. has been highly influential in the language modeling community. We tried to replicate a key part of their work and discovered discrepancies. Here's what we found. (1/9)
@vivek7ue Cost != activated params is mainly the hill I am defending until 12 am tonight.
@_arohan_ @vivek7ue It’s a good hill arxiv.org/abs/2110.12894
@m__dehghani This is exactly how I came to this tweet 😂
I appreciate that the Reka technical report cites this tweet: x.com/m__dehghani/st… Is anyone aware of a paper that supports this claim (or better yet, offers a plausible explanation for why this occurs?)
@borisdayma Always gunning for the highest possible LR! Not a fan of lowering LR to get a buttery-smooth loss curve. My stance: models trained on the edge of stability come out stronger. Admittedly, it's like walking on eggshells for massive models, but the payoff is totally worth it!
@borisdayma Always gunning for the highest possible LR! Not a fan of lowering LR to get a buttery-smooth loss curve. My stance: models trained on the edge of stability come out stronger. Admittedly, it's like walking on eggshells for massive models, but the payoff is totally worth it!
@m__dehghani I started to urgently bump my learning rates everywhere
Apparently, I'm an influencer now... for reckless neural network training!😬 Just got my first tweet citation: publications.reka.ai/reka-core-tech… btw, this report is a must-read! x.com/m__dehghani/st…
@borisdayma Always gunning for the highest possible LR! Not a fan of lowering LR to get a buttery-smooth loss curve. My stance: models trained on the edge of stability come out stronger. Admittedly, it's like walking on eggshells for massive models, but the payoff is totally worth it!
@SunnySanyal9 @m__dehghani Oh thanks for sharing. Looks interesting!
It's been a wild ride. Just 20 of us, burning through thousands of H100s over the past months, we're glad to finally share this with the world! 💪 One of the goals we’ve had when starting Reka was to build cool innovative models at the frontier. Reaching GPT-4/Opus level was a…
Meet Reka Core, our best and most capable multimodal language model yet. 🔮 It’s been a busy few months training this model and we are glad to finally ship it! 💪 Core has a lot of capabilities, and one of them is understanding video --- let’s see what Core thinks of the 3 body…
Along with Core, we have published a technical report detailing the training, architecture, data, and evaluation for the Reka models. publications.reka.ai/reka-core-tech…