Josh Susskind @jsusskin
Apple ML research: foundations, perception, action, future technology, creativity, curiosity, compositionality, scientific jazz! apple.com Cupertino, CA Joined February 2009-
Tweets766
-
Followers2K
-
Following538
-
Likes10K
Convolutions in MLX just got a whole lot faster thanks to @DiganiJagrit (a true Metal whisperer). Almost every size is faster, in some cases more than 10x. Benchmarks 🚀
If you can write it in RASP-L, a transformer can (probably maybe) learn it! Try it out!
If you can write it in RASP-L, a transformer can (probably maybe) learn it! Try it out!
🚨 Missed the ICML deadline today? Consider submitting a short (4 pages) or long (9 pages) paper to our GenAI+Decision Making workshop at ICLR 2024: sites.google.com/view/GenAI4DM-… ! Deadline is extended to February 9, AOE.
Happy to see the interest in our work advancing label-free estimates of self-supervised representations. This line relates to earlier efforts to develop measures to predict generalization, but makes nice use of the SSL setup as a discriminative "prior".
Happy to see the interest in our work advancing label-free estimates of self-supervised representations. This line relates to earlier efforts to develop measures to predict generalization, but makes nice use of the SSL setup as a discriminative "prior".
ICML 2024 call for workshops are open icml.cc/Conferences/20… @BeccaRoelofs, @natschluter, and @andrewgwils are co-chairing. Submit by February 15, 2024, AOE. Help us spread the word! #icml2024
Apple presents Rephrasing the Web A Recipe for Compute and Data-Efficient Language Modeling paper page: huggingface.co/papers/2401.16… Large language models are trained on massive scrapes of the web, which are often unstructured, noisy, and poorly phrased. Current scaling laws show…
If you can write extremely fast GPU kernels and want to work with us at Apple Machine Learning Research on MLX, DM / reach out, we are hiring! github.com/ml-explore/mlx
AIM🎯 models can now be directly downloaded from HuggingFace Hub 🤗 huggingface.co/collections/ap… Thanks to @NielsRogge @osanseviero @_akhaliq for their help setting this up!
Do you want to synthesize novel views from a monocular video of a dynamic scene without hundreds of GPU hours of optimization? Introducing PGDVS, a step toward this goal and a framework to understand what’s missing. Project page: xiaoming-zhao.github.io/projects/pgdvs/
Have a research idea at the intersection of generative models and RL, IL or planning? Consider submitting a short (4 pages) or long (9 pages) paper to our GenAI+Decision Making workshop at ICLR 2024: sites.google.com/view/GenAI4DM-… More information coming early January!
We (Apple AI/ML) are looking for strong engineers and researchers in NYC. If you enjoy pushing the frontier of deep learning *and* building products used by millions of users, consider joining us at: jobs.apple.com/en-us/details/…
The crew at Hugging Face 🤗 made a bunch of pre-converted MLX models! Llama, Phi-2, Mistral, Mixtral (and instruct and code variations where available)! Easier than ever to get started running them locally. Checkout the MLX Community huggingface.co/mlx-community
Excited for @YuyangW95 and @itsbautistam to present this fun project on molecular conformation generation with diffusion models operating in function space. Hit any of us up if you are interested in this work or in generative models for science!
Excited for @YuyangW95 and @itsbautistam to present this fun project on molecular conformation generation with diffusion models operating in function space. Hit any of us up if you are interested in this work or in generative models for science!
I'll present Manifold Diffusion Fields as an oral at the Diffusion Models workshop @NeurIPSConf, today 2:30 pm New Orleans time!! work with @YuyangW95 @jsusskin and @itsbautistam Don't forget to join and let's know your questions and feedback! openreview.net/pdf?id=0AMMdKc…
A great turnout at the LDM tutorial, and a hard act to follow. If you are hungry for more; please come to our workshop on diffusion models this Friday in Hall B1: diffusionworkshop.github.io Submit questions to our fantastic panel of experts here docs.google.com/forms/d/e/1FAI…
A great turnout at the LDM tutorial, and a hard act to follow. If you are hungry for more; please come to our workshop on diffusion models this Friday in Hall B1: diffusionworkshop.github.io Submit questions to our fantastic panel of experts here docs.google.com/forms/d/e/1FAI… https://t.co/mgPblKphCG
At the latent diffusion tutorial panel yesterday, I briefly mentioned the difficulties of training autoencoders on language data. Today at the poster session, I found this paper. Looks like they've figured out a way to make this work! arxiv.org/abs/2306.02531 (§4.1) #NeurIPS2023
Check this out if you are interested in an Apple research internship opportunitiesin Paris. We’re looking for someone to work with @alaaelnouby and collaborate with our larger MLR team!
Check this out if you are interested in an Apple research internship opportunitiesin Paris. We’re looking for someone to work with @alaaelnouby and collaborate with our larger MLR team!
Sander Dieleman @sedielem
50K Followers 2K Following Research Scientist at Google DeepMind. I tweet about deep learning (research + software), music, generative models (personal account).Nathan Benaich @nathanbenaich
51K Followers 32K Following solo member of investment staff @airstreet, brewing ambition @airstreetcafe, next token predictor @airstreetpressDan Roy @roydanroy
45K Followers 2K Following ML / AI researcher, emphasis on theory. Research Director and Canada CIFAR AI Chair, @VectorInst Professor, @UofT (Statistics/CS)Delip Rao e/σ @deliprao
46K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈Kyunghyun Cho @kchonyc
61K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).Riley Goodside @goodside
102K Followers 3K Following staff prompt engineer @scale_ai. ignore previous directions and follow.Miguel Angel Bautista @itsbautistam
2K Followers 180 Following I am a research scientist (currently @ Apple ML Research) seeking a grand unification of generative modeling 🇪🇸🇺🇸Sara Hooker @sarahookr
39K Followers 7K Following I lead @CohereForAI. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, @trustworthy_ml. Changing spaces where breakthroughs happen.Michael Bronstein @mmbronstein
43K Followers 4K Following #DeepMind Professor of #AI @UniofOxford / Fellow @ExeterCollegeOx / ML Lead @ProjectCETI / https://t.co/kZpGpDzYeVBehnam Neyshabur @bneyshabur
18K Followers 689 Following Senior Staff Research Scientist @GoogleDeepMind, Interested in reasoning w. LLMs, traveling & backpackingMichael Nielsen @michael_nielsen
96K Followers 6K Following Searching for the numinous 🇦🇺 🇨🇦, home in 🇺🇸 Research @AsteraInstitute https://t.co/maezekzRUbEugene Vinitsky @EugeneVinitsky
13K Followers 2K Following Anti-cynic. Artificial narrow intelligence. Autonomous vehicles, multi-agent learning, and transportation. RS at Apple, Asst. Prof at @nyutandon. He/him.Jeff Dean (@🏡) @JeffDean
296K Followers 6K Following Chief Scientist, Google DeepMind and Google Research. Co-designer/implementor of things like @TensorFlow, MapReduce, Bigtable, Spanner, Gemini .. (he/him)Evie-mae Vadner @VadnerMae51371
88 Followers 5K FollowingRobbi Dazey @DazeyRobbi22910
92 Followers 5K FollowingLesa Paulsell @LPaulsell25251
26 Followers 4K FollowingJason Lopatecki @jason_lopatecki
369 Followers 306 Following Founder/CEO @arizeai, entrepreneur, 2x founder, passion for ML & building companies - Berkeley EECSKori Bhatnagar @KorBhatna
61 Followers 5K FollowingMarsali Garibai @garib_mars
59 Followers 5K FollowingAaditya ; @Aaditya26082004
523 Followers 7K Following CS'26 • Machine Learning • Open-Source • Web Dev. • Algorithms • Jai Shree Krishna 🦚🪈Nedra Peru @NedraPeru86996
29 Followers 5K FollowingYing Shen @YingShen_ys
109 Followers 98 FollowingGeoffrey.py @geoffreybrunet5
240 Followers 191 Following Software engineer and matcha addict 🍵. MSc in computer science. Love AI (it's not chatgpt) and sciences. He / him. ENG-FR, Tweets are my own.Alys Majeski @AlysMajesk82470
68 Followers 5K FollowingUnfairAdvantage @anonymalexander
1 Followers 1K Following “I wish you much pain and suffering”, says Jensen HuangAzalea Romulus @azalea38940
87 Followers 5K FollowingKarri Sigurdson @SigurdsKarr
32 Followers 5K FollowingRosaura Iatarola @RIatarol
28 Followers 5K FollowingCallie Mezey @CallieMeze6447
85 Followers 5K FollowingEva Louise Marie Gabr.. @e681554349
8 Followers 3K FollowingKassandra Kieke @KassandKie
45 Followers 5K FollowingKrista Soesbe @krist_soesb
79 Followers 5K FollowingValerie Gaudenzi @GaudenziVa78925
77 Followers 5K FollowingHermila Ganong @HermilaG6902
83 Followers 5K FollowingJinghua Zhong @zhongjinghua
0 Followers 3K FollowingCharlsie Herring @charls_herri
69 Followers 5K FollowingPatrick Poirson @patrickpoirson
58 Followers 235 Following Computer Vision/Machine Learning. PhD student at UNC Chapel HillAman Hanspal @hanspaa2017108
168 Followers 2K Followingpaco xu @xu_paco
957 Followers 1K Following A husband, a father, a Kubernetes contributor, a big football/Valencia fan, a PUBG fan.Abhilasha Jain @The_TechGirl
15 Followers 85 Following Biomedical Engineer | Applied Mathematician | Bibliophile | Artistic PandaIsabella-rose Alvalle @RoAlvall
60 Followers 5K FollowingZijie Li @ZijieLi00
42 Followers 150 Following PhD student of Mechanical Engineering@CMU, AI for simulation https://t.co/b329xDMLpFAmitis Shidani @AmitisShidani1
8 Followers 46 Following DPhil Student at Statistical Machine Learning @OxCSMLKevin Miao @KJHMiao
169 Followers 531 Following ML Research @ | DS Lecturer @ UC Berkeley | 🇳🇱🏳️🌈 |salah mohamed @salah__muhammad
1K Followers 2K Following ⭒* 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴*⭒ Just a normal noob, Trying to find outShell Yao @brianne_shell
0 Followers 20 FollowingMarco Matthies @MarcoMatthies
90 Followers 2K Following Interested in math, programming, computational biology, AI, and investing.Gia Boglioli @BoglioBoglioli
39 Followers 5K FollowingDev Vidhani @DevVidhani
6 Followers 177 Following Techonologist. Currently at Aster Data Systems, Inc. Previously, at EMC Systems, Inc; Kazeon Systems. Inc. and Sun Microsystems.AK @_akhaliq
309K Followers 3K Following AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo follow on Hugging Face: https://t.co/q2Qoey80GxAndrej Karpathy @karpathy
978K Followers 904 Following 🧑🍳. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets 🧠🤖💥Yann LeCun @ylecun
710K Followers 718 Following Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.François Chollet @fchollet
469K Followers 770 Following Deep learning @google. Creator of Keras. Author of 'Deep Learning with Python'. Opinions are my own.Kosta Derpanis @CSProfKGD
48K Followers 198 Following #CS Associate Prof @YorkUniversity, #ComputerVision Scientist Samsung #AI, @VectorInst Faculty Affiliate, TPAMI AE, #CVPR2024/#ECCV2024 Publicity Co-chairSoumith Chintala @soumithchintala
186K Followers 876 Following Cofounded and lead @PyTorch at Meta. Also dabble in robotics at NYU. AI is delicious when it is accessible and open-source.Google DeepMind @GoogleDeepMind
943K Followers 275 Following We’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.Sander Dieleman @sedielem
50K Followers 2K Following Research Scientist at Google DeepMind. I tweet about deep learning (research + software), music, generative models (personal account).Michael Black @Michael_J_Black
58K Followers 638 Following Director, Max Planck Institute for Intelligent Systems (@MPI_IS). Chief Scientist @meshcapade. Building 3D digital humans using vision, graphics, and learning.Dan Roy @roydanroy
45K Followers 2K Following ML / AI researcher, emphasis on theory. Research Director and Canada CIFAR AI Chair, @VectorInst Professor, @UofT (Statistics/CS)Matthias Niessner @MattNiessner
31K Followers 162 Following Professor for Visual Computing & Artificial Intelligence @TU_Muenchen Co-Founder @synthesiaIOJohn Carmack @ID_AA_Carmack
1.1M Followers 241 Following AGI at Keen Technologies, former CTO Oculus VR, Founder Id Software and Armadillo AerospaceDelip Rao e/σ @deliprao
46K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈Kyunghyun Cho @kchonyc
61K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).Kevin Patrick Murphy @sirbayes
42K Followers 333 Following Research Scientist at Google Brain / Deepmind. Interested in Bayesian Machine Learning.Rosanne Liu @savvyRL
33K Followers 965 Following Cofounded & running @ml_collective. Host of Deep Learning Classics & Trends. Research at Google DeepMind. DEI/DIA Chair of ICLR & NeurIPS. Writing https://t.co/IbycyGfnDRMike Lewis @ml_perception
6K Followers 227 Following Llama3 pre-training lead. Partially to blame for things like the Cicero Diplomacy bot, BART, RoBERTa, kNN-LM, top-k sampling & Deal Or No Deal.Ying Shen @YingShen_ys
109 Followers 98 FollowingBrandon McKinzie @mckbrando
2K Followers 2K Following Multimodal LLMs @Apple. Prev: Physics/CS @UCBerkeley.Jagrit Digani @DiganiJagrit
514 Followers 22 Following ML Engineer at Apple Machine Learning ResearchRichard He Bai @richard_baihe
57 Followers 155 Following ML Researcher @Apple MLR. prev. phd @Uwaterloo; ex-intern @MSFTResearch @BaiduResearchAlaa El-Nouby @alaa_nouby
528 Followers 302 Following Research Scientist at @Apple. Previous: @Meta (FAIR), @Inria, @MSFTResearch, @VectorInst and @UofG . Egyptian 🇪🇬 Deprecated twitter account: @alaaelnoubyArmand Joulin @armandjoulin
4K Followers 344 Following principal researcher, @googledeepmind. ex director of emea at fair @metaai. mostly work on open projects: fasttext, dino, llama, gemma.Ajay Jain @ajayj_
6K Followers 3K Following Co-founder @genmoai. Co-created denoising diffusion (DDPM), DreamFusion, Dream Fields. Ex Ph.D. @berkeley_ai, @googleai, @facebookai, @nvidiaai, @mitDmytro Mishkin 🇺�.. @ducha_aiki
18K Followers 591 Following Marrying classical CV and Deep Learning. I do things, which work, rather than being novel, but not working.John Schulman @johnschulman2
39K Followers 609 Following Cofounder @openai, lead post-training for ChatGPT and the API. Interested in reinforcement learning, alignment, birds, jazz musicjessica 🍉 @ItsMrsRabbitToU
96K Followers 33K Following FREE PALESTINE 🇵🇸 • feminist • 🏳️🌈 • PS5 gamer • pagan • muslim family • girl mommy • link is my hubby’s music • Friendly Neighborhood Incel Slayer™Mohammad Rastegari @morastegari
1K Followers 114 Following Distinguished AI Scientist at Meta. Affiliate Assistant Professor at University of Washington.Dan Roberts @danintheory
4K Followers 570 Following I studied gravity. AI fellow @sequoia + researcher @mit physics. Co-founded @diffeo, acquired by @salesforce. Co-author "The Principles of Deep Learning Theory”Angelos Katharopoulos @angeloskath
2K Followers 236 Following Machine Learning Research @Apple. Previously PhD student at @idiap_ch and @EPFL. Interested in all things machine learnableMaartje ter Hoeve @maartjeterhoeve
2K Followers 380 Following Machine Learning Researcher @Apple MLR • PhD from @UvA_Amsterdam • MSc AI, BA Linguistics • Interned @Apple MLR; @MetaAI; @MSFTResearch; @BlendleResearchSakana AI @SakanaAILabs
19K Followers 0 Following We are a Tokyo-based R&D company on a quest to create a new kind of foundational AI model based on nature-inspired intelligence. https://t.co/1q07mb3TzEGeorgia Gkioxari @georgiagkioxari
9K Followers 412 Following Assistant professor in Computing + Mathematical Sciences @Caltech 🏛️ ∙ Computer vision enthusiast 🤖 ∙ Previously at @metaai 👩🏻💻∙ From 🇬🇷Taylor W. Killian @tw_killian
2K Followers 766 Following #ML4H research @UofT/@VectorInst interested in Decision Making & Transfer Learning, visiting @MIT // @BYU '13; @Harvard '17Pau Rodríguez López @prlz77
1K Followers 1K Following Research Scientist @Apple MLR on #machine_learning understanding and robustness. @ELLISforEurope member. Previously at ServiceNow and Element AI in Montréal.Vaishaal Shankar @Vaishaal
805 Followers 335 Following ML research @ apple. Trying to find artificial intelligence. Opinions are my own.Pierre Ablin @PierreAblin
5K Followers 338 Following Machine learning research scientist at @Apple. I mostly tweet about optimization, stats and ML.Simon Prince @SimonPrinceAI
9K Followers 331 Following Professor of Computer Science, University of BathDileep George @dileeplearning
10K Followers 1K Following AGI research @DeepMind. Ex cofounder & CTO @vicariousai (acqd by Alphabet) and @Numenta. Triply EE (BTech IIT-Mumbai, MS&PhD Stanford). #AGIComicsHattie Zhou @oh_that_hat
5K Followers 764 Following Finding \hat{y} Give me anonymous feedback: https://t.co/7aBNrpbad8Melika Ayoughi @melikaayoughi
475 Followers 238 Following Ph.D. Candidate at the University of Amsterdam at VISLab @UvA_Amsterdam & @INDE_LAB_AMSClaas Voelcker @c_voelcker
1K Followers 707 Following "All models are wrong, but some are useful" "Do not disfigure the soul" - PhD candidate @UofT, RL researcher unfocused on too many things, he/him, 🏳️🌈Carson Pun @Carson_Pun
52 Followers 55 FollowingLingjie Liu @LingjieLiu1
3K Followers 640 Following Assistant Professor at UPenn. Research interests: Neural Scene Representation, Neural Rendering, Human Performance Modeling and Capture.Ge Yang @EpisodeYang
3K Followers 2K Following I am planting acorns one at a time with policy gradient.Zhuofeng @cserxy
122 Followers 410 Following Ph.D. candidate at Umich; formally interned @Apple MLR @MetaAI @AlibabaGroup . #NLProcJeremy Cohen @deepcohen
4K Followers 866 Following PhD student in machine learning at Carnegie Mellon. The goal of my research is to turn deep learning into a real engineering discipline.Ola @ukiro
233 Followers 205 Following XR thinkerer || Connoisseur of old school Doom and weird music || Tech creative lead @ https://t.co/SCoBrcas6LAlex Dimakis @AlexGDimakis
13K Followers 2K Following UT Austin Professor. Researcher in Machine Learning and Information Theory. National AI Institute on the Foundations of Machine Learning (IFML) Co-director.Neel Nanda @NeelNanda5
13K Followers 89 Following Mechanistic Interpretability lead @DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!Miles Cranmer @MilesCranmer
12K Followers 903 Following Assistant Prof @Cambridge_Uni, works on AI for the physical sciences. Previously: Flatiron, DeepMind, Princeton, McGill.Zhangir Azerbayev @zhangir_azerbay
892 Followers 542 Following Building an artificial mathematician @PrincetonCS.Abhinav Gupta @backpropper
798 Followers 5K Following phd student @Mila_Quebec | ms @CILVRatNYU @NYU_Courant | previously @GoogleDeepMind @AIatMeta @GoogleAI @labsdotgoogle @MSFTResearch @AdobeResearchYizhe Zhang @YizheZhangNLP
1K Followers 441 Following Research Scientist at Apple MLR | ex-researcher @ Microsoft Research, Meta AI | PhD @ Duke UniversityQLoRA fine-tuning 4-bit Gemma 2B on iPhone 15 Pro with MLX Swift. A nice size for fine-tuning on device, getting 70-100 toks/sec depending on the batch. Guide here: github.com/ml-explore/mlx…
Cool new work from some colleagues at Apple: more accurate LLMs with fewer parameters and fewer pre-training tokens. Also has MLX support out of the box! Code here: github.com/apple/corenet/…
Apple presents OpenELM An Efficient Language Model Family with Open-source Training and Inference Framework The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and
Apple presents OpenELM - An efficient LM family with open-source training and inference framework - Performs on par with OLMo while requiring 2x fewer pre-training tokens repo: github.com/apple/corenet hf: huggingface.co/apple/OpenELM abs: arxiv.org/abs/2404.14619
Apple presents OpenELM An Efficient Language Model Family with Open-source Training and Inference Framework The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and
This work was one of the last works that was done by my team when I was working at Apple. A lot of credit to @sacmehtauw whose dedication was the key to this project. Main point behind here is to show as a contributor to the AI community we play our role to be fully open.
Apple presents OpenELM An Efficient Language Model Family with Open-source Training and Inference Framework The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and
Next level: QLoRA fine-tuning 4-bit Llama 3 8B on iPhone 15 pro. Incoming (Q)LoRA MLX Swift example by David Koski: github.com/ml-explore/mlx… works with lot's of models (Mistral, Gemma, Phi-2, etc)
Looking forward to seeing this example from my friend David!
Next level: QLoRA fine-tuning 4-bit Llama 3 8B on iPhone 15 pro. Incoming (Q)LoRA MLX Swift example by David Koski: github.com/ml-explore/mlx… works with lot's of models (Mistral, Gemma, Phi-2, etc)
PhysDreamer Physics-Based Interaction with 3D Objects via Video Generation Realistic object interactions are crucial for creating immersive virtual experiences, yet synthesizing realistic 3D object dynamics in response to novel interactions remains a significant
@emollick When things go viral on twitter they attract a mix of constructive criticism and straight up trolling. I really hate that aspect of social media. This paper seemed super promising—my postdoc already emailed me the paper today. Hopefully we continue to develop this type of work.
I have to say it because @awnihannun is quick to give credit to others but doesn’t take much for himself. This performance improvement largely comes from his relentless hunting down of every kind of overhead in MLX the past weeks. Kudos!!!
MLX 0.10 → 0.11, faster generation across model sizes and machines. tokens-per-second for 4-bit models:
Wrenching news: Dan Dennett has died. He's been a great friend and incredible inspiration for me throughout my career. I will miss him enormously. dailynous.com/2024/04/19/dan…
Llama 3 models are in the 🤗 MLX Community thanks to @Prince_Canuma Check them out: huggingface.co/collections/ml… The 4-bit 8B model runs at > 104 toks-per-sec on an M2 Ultra.
One of my favorite things about MLX is it helps put ML research back in the hands of a single bold hobbyist. Don’t need a supercomputer to invent - just a nice laptop, a vision, and some persistence, (and maybe pip install mlx 😉)
There is only scale and cosine schedule and adamw with batchsize that are big but not too big and a post..not wait pre..no wait postnorm with rsmnorm and gradient clipping and RoPe with sentencepiece with no dummy whitespace on heavily preprocessed data, duh?
To all the defeatists who think there is nothing else but scale: * 5 years between Self-Attention Is All You Need and FlashAttention * Transformers still require warmup. Researchers: get back to work! The future is bright :)
🚀 How can meta-learning, self-attention & JAX power the next generation of Evolutionary Optimizers 🦎? Excited to share my @DeepMind internship project and our #ICLR2023 paper ‘Discovering Evolution Strategies via Meta-Black-Box Optimization’ 🎉 📜: openreview.net/forum?id=mFDU0…
@fhuszar Yeah, and the set as described also includes auto-regressive LMs (which could be trained with L2 loss, as a proper score, making it technically a regression)… so, we need to add some restrictions, hmm
Q: there are various ways to generalize diffusions, but is there some definition that captures the set of (informally) “generative models that reduce to regressions (/learning conditional expectations)”? Or, some reason this isn’t a good question?
It's 12 degrees out and patios are packed in Toronto. Glad to see some things stay the same.
🌟 Introducing MIS: the first-ever large-scale multi-image dataset comprising sets of images interconnected by general semantic relationships. MIS consists of a total of 12M synthetic multi-image set samples, each with 25 interconnected images. Designed for broad, domain-general…
@kchonyc @LightningAI Try MLX. Gets 100 toks/sec on my M1 Max. 1. pip install mlx-lm 2. python -m mlx_lm.generate --model mlx-community/quantized-gemma-2b-it --prompt "Write a story about Einstein" --temp 0.0 --max-tokens 256