Excited to announce that SqueezeLLM and LLMCompiler have been accepted to ICML 2024! 🎉
SqueezeLLM addresses massive outliers in LLMs, through a dense-and-sparse decomposition. The massive outliers are efficiently isolated through in the sparse part, and the remainder is…
Apple's recent OpenELM is very interesting and showcases that a layer-wise scaling technique can be more optimal, at least for small language model regime. Some insights:
- Adopts Transformer modifications from the DeLight paper: starts with narrower, then wider blocks…
What to do if you don’t have enough data to fine-tune an LLM? Fine-tuning is a very promising method for specializing LLMs but it often requires a non-trivial number of data points. But in many cases it is very hard to obtain enough data.
LLM2LLM addresses this by utilizing a…
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Significantly enhances the performance of LLMs in the low-data regime, outperforming various baselines (e.g. up to 24.2% improv. on GSM8K)
arxiv.org/abs/2403.15042
Will LLMs disrupt modern e-commerce and web navigation? 🤖🛍️
We recently tested LLMCompiler on WebShop dataset and it outperformed ReAct by 20% higher accuracy. So we decided to test this on a real website. We asked LLMCompiler to buy an On running shoe, and gave it browser…
I think this will mark an important milestone for Gen AI. The spotlight has been on the capabilities of LLMs (scaling laws, leaderboards, etc). But it's now clear that LLM performance alone will be meaningless. You will need a Compound AI system to get the best performance out of…
I think this will mark an important milestone for Gen AI. The spotlight has been on the capabilities of LLMs (scaling laws, leaderboards, etc). But it's now clear that LLM performance alone will be meaningless. You will need a Compound AI system to get the best performance out of…
What is blocking LLMs from allowing long context inputs?
🚨Introducing KVQuant which allows serving LLaMA-7B with 1M context length on a single A100! 🔥
Current largest model is Claude-2.1 which is limited to 200K tokens. What is the challenge for increasing this?
Two key…
Here are 7 challenges that AI engineers must solve in order to build large-scale intelligent agents (“LLM OSes”):
1️⃣ Improving Accuracy: Make sure agents can solve hard tasks well
2️⃣ Moving beyond serial execution: identify parallelizable tasks and run them accordingly
3️⃣…
🔥Breaking News from Arena
Google's Bard has just made a stunning leap, surpassing GPT-4 to the SECOND SPOT on the leaderboard! Big congrats to @Google for the remarkable achievement!
The race is heating up like never before! Super excited to see what's next for Bard + Gemini…
Our first webinar of 2024 explores how to efficiently, performantly build agentic software 🎉
We’re excited to host @sehoonkim418 and @amir__gholami to present LLMCompiler: an agent compiler for parallel multi-function planning/execution.
Previous frameworks for agentic…
How can we make LLM agents work together efficiently on complex tasks at a large scale?
🚨Introducing LLMCompiler🦙🛠️, a tool that compiles an effective plan for executing multiple tasks in parallel.
It helps create scalable LLM applications, identifies tasks for parallel…
2-Year Update on AI and Memory Wall: New hardware data shows that memory is increasingly becoming the main bottleneck and not compute! Some observations:
* The peak compute available in Flagship HW has been increasing at a rate of 3.0x/2yrs. In contrast, both the DRAM and…
@karpathy Very interesting post. Deepmind released their paper concurrent to our work on "Big Little Transformer Decoder", which exploits this opportunity. But we also found that it is better to use a *dynamic fallback policy* instead of always falling back after generating a fixed number…
Very interesting post by @karpathy (also thanks for sharing our blogpost).
Based on this we have also recently designed a more efficient compression called dense-and-sparse quantization to reduce memory traffic. This will be added to llama.cpp soon
github.com/SqueezeAILab/S…
Very interesting post by @karpathy (also thanks for sharing our blogpost).
Based on this we have also recently designed a more efficient compression called dense-and-sparse quantization to reduce memory traffic. This will be added to llama.cpp soon
github.com/SqueezeAILab/S…
If you're seeking to understand how Transformers work and where to make them better, there is no better paper to read than "Full Stack Optimization of Transformer Inference: a Survey" by Kim and Hooper with @KurtKeutzer and @amir__gholamiarxiv.org/abs/2302.14017
I hit a bug in the Attention formula that’s been overlooked for 8+ years. All Transformer models (GPT, LLaMA, etc) are affected.
Researchers isolated the bug last month – but they missed a simple solution…
Why LLM designers should stop using Softmax 👇
evanmiller.org/attention-is-o…
4K Followers 869 FollowingPhD student in machine learning at Carnegie Mellon. The goal of my research is to turn deep learning into a real engineering discipline.
348 Followers 4K FollowingCS PhD student at Uni of Birmingham in the United Kingdom. Research interests: Automated Machine Learning (BayesianOp), and Reinforcement Learning
695 Followers 1K FollowingPh.D. student @ucsbcs and @ucsbNLP working on machine learning for #NLProc. Previously, @Apple MLR, @MSFTResearch, @ClovaAiLab, and @SNUnow.
2K Followers 758 FollowingThe original hand drawn #NFT maze collection exclusively minted on #Hedera. Each one is unique, scaleable, solvable and decorative.
6K Followers 3K FollowingTechnologist and Futurist | Artificial Intelligence Engineering | Data Leadership | Lecturer @MIT | Ex @Qantas | #buildinpublic Views are my own
16K Followers 684 FollowingChief Scientist, Neural Networks @Databricks via MosaicML. PhD @MIT_CSAIL. BS/MS @PrincetonCS. DC area native. Making AI efficient for everyone at @DbrxMosaicAI
113 Followers 29 FollowingCrux is an AI Co-pilot that enables super-fast decision-making for enterprises assisted by LLM agents having full business context
1.0M Followers 914 FollowingCo-Founder of Coursera; Stanford CS adjunct faculty. Former head of Baidu AI Group/Google Brain. #ai #machinelearning, #deeplearning #MOOCs
342 Followers 303 FollowingWorking on System for ML @ucbrise. Happy to get in touch: https://t.co/ACIbL2HqBr at https://t.co/FWFXdUDDMp (ex-@anyscalecompute)
31K Followers 936 FollowingI’m Alex Graveley, creator of GitHub Copilot, AI Tinkerers, Dropbox Paper, MobileCoin, and Hackpad.
Building @ai_minion
Hiring https://t.co/nsHar8OLPC
505K Followers 68K FollowingFollow me on my new podcast with AI startups, Unaligned. Tech industry color commentator since 1993. Author/Blogger. Former strategist @Microsoft.
33K Followers 3K FollowingCo-Founder & CEO of @LimitlessAI: a personalized AI powered by what you’ve seen, said, or heard. Formerly @RewindAI. Co-Founder of @Optimizely (sold for $300M).
29K Followers 788 FollowingVP GenAI @Databricks. Former CEO/cofounder MosaicML & Nervana/IntelAI. Neuro + CS. I like to build stuff that will eventually learn how to build other stuff.