Amir Gholami @amir__gholami

PostDoc, BAIR, UCB amirgholami.org Joined July 2012

Tweets

62
Followers

384
Following

168
Likes

92

Amir Gholami @amir__gholami

20 hours ago

Excited to announce that SqueezeLLM and LLMCompiler have been accepted to ICML 2024! 🎉 SqueezeLLM addresses massive outliers in LLMs, through a dense-and-sparse decomposition. The massive outliers are efficiently isolated through in the sparse part, and the remainder is…

0 4 13 1K 4

Download Image

Amir Gholami @amir__gholami

7 days ago

Apple's recent OpenELM is very interesting and showcases that a layer-wise scaling technique can be more optimal, at least for small language model regime. Some insights: - Adopts Transformer modifications from the DeLight paper: starts with narrower, then wider blocks…

0 0 2 451 0

Download Image

Nick Lee @nicholaszlee

3 weeks ago

What to do if you don’t have enough data to fine-tune an LLM? Fine-tuning is a very promising method for specializing LLMs but it often requires a non-trivial number of data points. But in many cases it is very hard to obtain enough data. LLM2LLM addresses this by utilizing a…

1 2 5 270 1

Download Image

Aran Komatsuzaki @arankomatsuzaki

a month ago

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement Significantly enhances the performance of LLMs in the low-data regime, outperforming various baselines (e.g. up to 24.2% improv. on GSM8K) arxiv.org/abs/2403.15042

3 71 316 31K 179

Download Image

Amir Gholami @amir__gholami

2 months ago

Will LLMs disrupt modern e-commerce and web navigation? 🤖🛍️ We recently tested LLMCompiler on WebShop dataset and it outperformed ReAct by 20% higher accuracy. So we decided to test this on a real website. We asked LLMCompiler to buy an On running shoe, and gave it browser…

0 6 13 619 3

Download Video

Ali Ghodsi @alighodsi

2 months ago

I think this will mark an important milestone for Gen AI. The spotlight has been on the capabilities of LLMs (scaling laws, leaderboards, etc). But it's now clear that LLM performance alone will be meaningless. You will need a Compound AI system to get the best performance out of…

Matei Zaharia @matei_zaharia

2 months ago

30 260 1K 288K 837

6 44 241 51K 71

Raja Koduri @RajaXg

2 months ago

And this chart makes me wonder if the next trillion dollar valuation hardware company is a memory technology innovator!

13 34 242 46K 109

Download Image

Coleman Hooper @coleman_hooper1

3 months ago

What is blocking LLMs from allowing long context inputs? 🚨Introducing KVQuant which allows serving LLaMA-7B with 1M context length on a single A100! 🔥 Current largest model is Claude-2.1 which is limited to 200K tokens. What is the challenge for increasing this? Two key…

3 7 20 1K 18

Download Image

LlamaIndex 🦙 @llama_index

3 months ago

Here are 7 challenges that AI engineers must solve in order to build large-scale intelligent agents (“LLM OSes”): 1️⃣ Improving Accuracy: Make sure agents can solve hard tasks well 2️⃣ Moving beyond serial execution: identify parallelizable tasks and run them accordingly 3️⃣…

2 58 289 33K 299

Download Image

lmsys.org @lmsysorg

3 months ago

🔥Breaking News from Arena Google's Bard has just made a stunning leap, surpassing GPT-4 to the SECOND SPOT on the leaderboard! Big congrats to @Google for the remarkable achievement! The race is heating up like never before! Super excited to see what's next for Bard + Gemini…

155 631 3K 2.6M 743

Download Image

LlamaIndex 🦙 @llama_index

3 months ago

Our first webinar of 2024 explores how to efficiently, performantly build agentic software 🎉 We’re excited to host @sehoonkim418 and @amir__gholami to present LLMCompiler: an agent compiler for parallel multi-function planning/execution. Previous frameworks for agentic…

2 38 191 30K 158

Download Image

sehoonkim @sehoonkim418

4 months ago

🚨LLMCompiler is now available on @llama_index 🦙. Check this out! How to use: llamahub.ai/l/llama_packs-… Notebook example: github.com/run-llama/llam…

LlamaIndex 🦙 @llama_index

4 months ago

🚨LLMCompiler is now available on @llama_index 🦙. Check this out! How to use: llamahub.ai/l/llama_packs-… Notebook example: github.com/run-llama/llam…

3 73 332 117K 304

Download Image

1 3 18 1K 4

sehoonkim @sehoonkim418

5 months ago

How can we make LLM agents work together efficiently on complex tasks at a large scale? 🚨Introducing LLMCompiler🦙🛠️, a tool that compiles an effective plan for executing multiple tasks in parallel. It helps create scalable LLM applications, identifies tasks for parallel…

17 125 781 169K 897

Download Image

Amir Gholami @amir__gholami

5 months ago

2-Year Update on AI and Memory Wall: New hardware data shows that memory is increasingly becoming the main bottleneck and not compute! Some observations: * The peak compute available in Flagship HW has been increasing at a rate of 3.0x/2yrs. In contrast, both the DRAM and…

0 1 21 5K 8

Download Image

sehoonkim @sehoonkim418

8 months ago

@karpathy Very interesting post. Deepmind released their paper concurrent to our work on "Big Little Transformer Decoder", which exploits this opportunity. But we also found that it is better to use a *dynamic fallback policy* instead of always falling back after generating a fixed number…

0 1 11 1K 2

Amir Gholami @amir__gholami

9 months ago

Very interesting post by @karpathy (also thanks for sharing our blogpost). Based on this we have also recently designed a more efficient compression called dense-and-sparse quantization to reduce memory traffic. This will be added to llama.cpp soon github.com/SqueezeAILab/S…

Andrej Karpathy @karpathy

9 months ago

81 741 5K 908K 3K

Download Image

0 0 6 1K 0

Ritwik Gupta 🇺🇦 @Ritwik_G

9 months ago

If you're seeking to understand how Transformers work and where to make them better, there is no better paper to read than "Full Stack Optimization of Transformer Inference: a Survey" by Kim and Hooper with @KurtKeutzer and @amir__gholami arxiv.org/abs/2302.14017

0 31 114 10K 122

Evan Miller @EvMill

9 months ago

I hit a bug in the Attention formula that’s been overlooked for 8+ years. All Transformer models (GPT, LLaMA, etc) are affected. Researchers isolated the bug last month – but they missed a simple solution… Why LLM designers should stop using Softmax 👇 evanmiller.org/attention-is-o…