@tamaybes but current models don’t allocate parameters to rotary embs!
this means the Chinchilla D=20*N is skewed already for the actual param counts of most models, even if it held across datasets! If we disregarded the pos. encoding params the coefficients would change
@tamaybes a super-fun arcane historical detail:
Gopher (and by extension Chinchilla) use Transformer-XL style position encodings. This means they spend 20B params (Gopher) and 5B params (Chinchilla) on just rel. position encoding!
the best TTS available from 2020-2021 was done by a single unemployed guy who supported every single my little pony voice and literally nothing else
the best TTS from 2022-2024 was also done by a single (different) unemployed guy w a custom-built 6x3090 rig in his basement (1/2)
This seems clearly correct to me and is something I've personally experienced.
Probably the easiest way to see this is true is to realize that people don't know the logical closure of their beliefs, but given time and a pencil can work many things in said logical closure out.
This seems clearly correct to me and is something I've personally experienced.
Probably the easiest way to see this is true is to realize that people don't know the logical closure of their beliefs, but given time and a pencil can work many things in said logical closure out.
Really amazing work by the @huggingface team! Infrastructure work, including dataset work, evaluations work, and building libraries, is the single highest-leverage thing you can do in AI. This will provide dividends for the broader AI community for years to come.
Really amazing work by the @huggingface team! Infrastructure work, including dataset work, evaluations work, and building libraries, is the single highest-leverage thing you can do in AI. This will provide dividends for the broader AI community for years to come.
An essential blocker to training LLMs on public domain books is not knowing which books are in the public domain. We're working on it, but it's slow and costly... if you're interested in providing support reach out!
An essential blocker to training LLMs on public domain books is not knowing which books are in the public domain. We're working on it, but it's slow and costly... if you're interested in providing support reach out!
SSMs + long sequence analysis + malware detection with LLMs is all the buzzwords you need to decide to check our paper out, right?
arxiv.org/abs/2403.17978
SSMs + long sequence analysis + malware detection with LLMs is all the buzzwords you need to decide to check our paper out, right?
arxiv.org/abs/2403.17978
Training data transparency is an unambiguous win for society, but all the incentives are against companies doing it right now. We need to fix this as soon as possible.
Training data transparency is an unambiguous win for society, but all the incentives are against companies doing it right now. We need to fix this as soon as possible.
We are excited to see torchtune, a newly announced PyTorch-native finetuning library, integrate with our LM Evaluation Harness library for standardized, reproducible evaluations!
Read more here:
Blog: pytorch.org/blog/torchtune…
Thread:
We are excited to see torchtune, a newly announced PyTorch-native finetuning library, integrate with our LM Evaluation Harness library for standardized, reproducible evaluations!
Read more here:
Blog: pytorch.org/blog/torchtune…
Thread:
Zyphra is pleased to announce Zamba-7B:
- 7B Mamba/Attention hybrid
- Competitive with Mistral-7B and Gemma-7B on only 1T fully open training tokens
- Outperforms Llama-2 7B and OLMo-7B
- All checkpoints across training to be released (Apache 2.0)
- Achieved by 7 people, on 128…
Calling all academic AI researchers! 🚨
We are conducting a survey on compute resources. We want to help the community better understand our capabilities+needs. We hope that this will help us all advocate for the resources we need!
Please contribute at: forms.gle/3hEie4hj999fiS…
🚀 Introducing Pile-T5!
🔗 We (EleutherAI) are thrilled to open-source our latest T5 model trained on 2T tokens from the Pile using the Llama tokenizer.
✨ Featuring intermediate checkpoints and a significant boost in benchmark performance.
Work done by @lintangsutawika, me…
I've been brain-dumping what I know about how LLMs work for several months now into an accessible general audience book! Check out the pre-release at the link.
I've been brain-dumping what I know about how LLMs work for several months now into an accessible general audience book! Check out the pre-release at the link.
80K Followers 1K FollowingInterdisciplinary researcher focused on shaping AI towards long-term positive goals. ML & Ethics.
Same content in the Sky, Threads, & the Prehistoric Elephant
47K Followers 1K FollowingCo-founder and CTO at @huggingface 🤗. ML/AI for everyone, building products to propel communities fwd. @Stanford + @Polytechnique
187K Followers 883 FollowingCofounded and lead @PyTorch at Meta.
Also dabble in robotics at NYU.
AI is delicious when it is accessible and open-source.
33K Followers 968 FollowingCofounded & running @ml_collective.
Host of Deep Learning Classics & Trends.
Research at Google DeepMind.
DEI/DIA Chair of ICLR & NeurIPS.
Writing https://t.co/IbycyGfnDR
54K Followers 1K FollowingPhD at 19 |
Founder and CEO at @MedARC_AI |
Research Director at @StabilityAI |
@kaggle Notebooks GM |
Biomed. engineer @ 14 |
TEDx talk➡https://t.co/xPxwKTq6Qb
46 Followers 625 FollowingI tweet about machine learning, earth observation & my music discoveries. 🏙️ previously an architect 👨💻️ now a machine learning eng. 🎧 always a music head
682 Followers 269 Followingenjooyer of markets n acceleration, on an artificial intelligence generative and degenerative binge, all my homies hate safetyists, sometimes bursting into song
3K Followers 2K FollowingDog of a Westminster researcher. Wandering the labyrinthine corridors of the Blob. Owner's words in @TheSpectator, @UnHerd, @TheCriticMag, and @spikedonline.
80K Followers 1K FollowingInterdisciplinary researcher focused on shaping AI towards long-term positive goals. ML & Ethics.
Same content in the Sky, Threads, & the Prehistoric Elephant
262K Followers 26 FollowingWe're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant Claude at https://t.co/aRbQ97uk4d.
47K Followers 1K FollowingCo-founder and CTO at @huggingface 🤗. ML/AI for everyone, building products to propel communities fwd. @Stanford + @Polytechnique
187K Followers 883 FollowingCofounded and lead @PyTorch at Meta.
Also dabble in robotics at NYU.
AI is delicious when it is accessible and open-source.
33K Followers 968 FollowingCofounded & running @ml_collective.
Host of Deep Learning Classics & Trends.
Research at Google DeepMind.
DEI/DIA Chair of ICLR & NeurIPS.
Writing https://t.co/IbycyGfnDR
102K Followers 4K FollowingCo-Founder @myhackerhouse cyber security assurance & hacker training ~ ISBN9781119561453 ~ a book on professional hacking. Offensive Lua project.
27 Followers 68 FollowingSpecialized in the design and development of scalable distributed systems with BigData & AI. Passionate about hacking and training LLMs. A huge fan of astronomy
3K Followers 3K FollowingAI policy @StanfordHAI + avoiding war with China @BelferCenter. Words in @ForeignPolicy @TechCrunch et al. Ex @UNGlobalPulse @BanKillerRobots @hrw
6K Followers 979 FollowingGrad student @UMDCS. Past: @AIatMeta, @AmazonScience, @IITMadras. Currently working on #Diffusion and #Multimodal understanding. GPU poor. She/her.
128 Followers 61 FollowingSenior Researcher @Microsoft DeepSpeed team, working on deep learning systems. @SCSatCMU PhD, @RiceCompSci BS+MS. Views are my own. English/Chinese/Japanese.
1K Followers 1K FollowingResearch Scientist at @nvidia. Interested in the intersection of Computer Systems and ML. Occasionally tweet about sports. Views are my own.
4K Followers 173 FollowingML Engineer at Anlatan (@novelaiofficial). co-author of HDiT (Hourglass Diffusion Transformers). works on diffusion models and LLMs. 日本語を勉強してる。
26K Followers 526 FollowingI am MTGO user and former preview card getter Aspiringspike
https://t.co/pn7Z5hUMcz
he/him
business inquiries [email protected]
4K Followers 432 FollowingAssociate professor at U of T. Computer science and math research: (differentially) private data analysis, geometry, discrepancy, optimization.
4K Followers 281 FollowingAssociate Professor of Theoretical Computer Science @Cambridge_Uni.
My research is in Complexity Theory and Quantum Computing.
3K Followers 628 FollowingHacking neural networks so that we don’t get stuck in the matrix. Red Team Director @ Electronic Arts. Entrepreneur. Builder and Breaker. Opinions are my own.
3K Followers 1K FollowingMachine Learning Librarian @huggingface 🤗 | Championing Open Science & ML | Sharing the latest ML datasets 🌟 | Tips for mastering the HF Hub