Mingjie Sun @_mingjiesun

PhD student @CSDatCMU eric-mingjie.github.io Pittsburgh Joined July 2017

Tweets

92
Followers

362
Following

417
Likes

2K

Pratyush Maini @pratyushmaini

3 days ago

1/What does it mean for an LLM to “memorize” a doc? Exactly regurgitating a NYT article? Of course. Just training on NYT?Harder to say We take big strides in this discourse w/*Adversarial Compression* w/@A_v_i__S @zhilifeng @zacharylipton @zicokolter 🌐:locuslab.github.io/acr-memorizati…🧵

1 31 139 26K 86

Download Image

Sachin Goyal @goyalsachin007

2 weeks ago

1/Let me tell you the dark secrets 🔮 behind developing *new* scaling laws that no one wants you to know. A tale of “Another day. Another (failed) Scaling Law”. Working through key design decisions, limited compute, and other difficulties🧵.

Pratyush Maini @pratyushmaini

2 weeks ago

8 72 294 61K 175

Download Image

1 17 107 25K 87

Download Image

Pratyush Maini @pratyushmaini

2 weeks ago

1/ 🥁Scaling Laws for Data Filtering 🥁 TLDR: Data Curation *cannot* be compute agnostic! In our #CVPR2024 paper, we develop the first scaling laws for heterogeneous & limited web data. w/@goyalsachin007 @zacharylipton @AdtRaghunathan @zicokolter 📝:arxiv.org/abs/2404.07177

8 72 294 61K 175

Download Image

Yiding Jiang @yidingjiang

3 weeks ago

Models with different randomness make different predictions at test time even if they are trained on the same data. In our latest ICLR paper (oral), we investigate how models learn different features, and the effect this has on agreement and (potentially) calibration. 1/

3 35 135 23K 100

Download Image

Zhengyang Geng @ZhengyangGeng

a month ago

🚀Our latest blog post unveils the power of Consistency Models and introduces Easy Consistency Tuning (ECT), a new way to fine-tune pretrained diffusion models to consistency models. SoTA fast generative models using 1/32 training cost! 🔽 Get ready to speed up your generative…

7 46 142 37K 98

Download Image

Dan Roberts @danintheory

a month ago

Do LLMs really need to be so L? That's a rejected title for a new paper w/ @Andr3yGR, @kushal_tirumala, @Hasan_Shap, @PaoloGlorioso1 on pruning open-weight LLMs: we can remove up to *half* the layers of Llama-2 70B w/ essentially no impact on performance on QA benchmarks. 1/

18 57 355 86K 253

Download Image

Zhuang Liu @liuzhuang1234

2 months ago

Joint work with Kaiming He Check the paper for more! (non-)code: github.com/liuzhuang13/bi… arxiv: arxiv.org/abs/2403.08632 (Answer to the game: YFCC: 1, 4, 7, 10, 13; CC: 2, 5, 8, 11, 14; DataComp: 3, 6, 9, 12, 15)

0 4 33 2K 10

Zhuang Liu @liuzhuang1234

2 months ago

Very excited to share one of the most interesting projects I've ever worked on, but first, a small game: Here are 15 images from three of the largest and most diverse modern image datasets: YFCC100M, CC12M and DataComp-1B. Can you guess which images are from which datasets?

10 23 149 22K 38

Download Image

Xinlei Chen @endernewton

2 months ago

Fascinating and insightful work from @_mingjiesun @liuzhuang1234, took a much deeper look at the "massive activations" inside LLMs, proposing hypothesis and verified them as "biases" for attention, and they can appear in ViTs too!

Zhuang Liu @liuzhuang1234

2 months ago

31 171 1K 178K 895

Download Image

0 3 26 5K 5

Zhuang Liu @liuzhuang1234

2 months ago

LLMs are great, but their internals are less explored. I'm excited to share very interesting findings in paper “Massive Activations in Large Language Models” LLMs have very few internal activations with drastically outsized magnitudes, e.g., 100,000x larger than others. (1/n)

31 171 1K 178K 895

Download Image

Zhuang Liu @liuzhuang1234

2 months ago

Diffusion models have achieved remarkable results in visual generation. We demonstrate it can also generate neural networks parameters, in our new paper: "Neural Network Diffusion" (1/n) x.com/_akhaliq/statu…

AK @_akhaliq

2 months ago

23 254 1K 462K 799

Download Video

21 87 583 123K 298

Runtian Zhai @RuntianZhai

3 months ago

Unlabeled data is crucial for modern ML. It provides info about data distribution P, but how to exploit such info? Given a kernel K, our #ICLR2024 spotlight gives a general & principled way: Spectrally Transformed Kernel Regression (STKR). Camera-ready 👇 arxiv.org/abs/2402.00645

1 12 59 6K 18

Pratyush Maini @pratyushmaini

3 months ago

1/7 Super excited about my Apple Internship work finally coming out: Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling TLDR: You can train 3x faster and with upto 10x lesser data with just synthetic rephrases of the web! 📝 arxiv.org/abs/2401.16380

13 80 382 80K 206

Download Image

AK @_akhaliq

3 months ago

Apple presents Rephrasing the Web A Recipe for Compute and Data-Efficient Language Modeling paper page: huggingface.co/papers/2401.16… Large language models are trained on massive scrapes of the web, which are often unstructured, noisy, and poorly phrased. Current scaling laws show…

9 110 581 78K 401

Download Image

Zhuang Liu @liuzhuang1234

3 months ago

Diffusion models can do more than generation. Check out our new work on analyzing what's useful in diffusion models for visual representation learning! @endernewton @sainingxie

AK @_akhaliq

3 months ago

Diffusion models can do more than generation. Check out our new work on analyzing what's useful in diffusion models for visual representation learning! @endernewton @sainingxie

2 110 510 128K 251

Download Image

0 5 72 16K 14

Pratyush Maini @pratyushmaini

3 months ago

1/4 The Right to be Forgotten is knocking on the door. Yet, unlearning in LLMs has no clear task definition, no evaluation metrics or baselines. Introducing TOFU: Task of Fictitious Unlearning for LLMs 🌐 locuslab.github.io/tofu w/@A_v_i__S @zhilifeng @zacharylipton @zicokolter🧵

3 22 87 13K 28

Download Image

AK @_akhaliq

4 months ago

CMU presents TOFU A Task of Fictitious Unlearning for LLMs paper page: huggingface.co/papers/2401.06… Large language models trained on massive corpora of data from the web can memorize and reproduce sensitive or private data raising both legal and ethical concerns. Unlearning, or…

1 32 121 21K 57

Download Image

Zhuang Liu @liuzhuang1234

4 months ago

How to choose a vision model for your specific needs? How do ConvNet / ViT, supervised / CLIP models compare with each other on metrics beyond ImageNet? Our work comprehensively compares common vision models on "non-standard" metrics. (1/n)

9 138 712 183K 594

Download Image

Zico Kolter @zicokolter

5 months ago

Now that at NeurIPS is upon us shortly ... it's time to start planning for ICML😀! Thrilled to serve with @kat_heller @adrian_weller @nuriaoliver as PCs, and @rsalakhu as general chair. Call for papers is here: icml.cc/Conferences/20… Intro blog post: medium.com/@icml2024pc/we…

3 24 243 61K 15

Simone Torrecillas @TorrecillaSimo

80 Followers 5K Following

Mairead Ostermann @MaireadOst47192

61 Followers 5K Following

Shesath @Shesath112966

0 Followers 123 Following Life itself is a journey, we are all worthy and should strive to travel to different lives.

Onie Swecker @sweck_oni

12 Followers 2K Following 25 / Lets Chat👇😋

Shaniqua Champeau @ShaniquChamp

75 Followers 5K Following

Melany Lopresto @LoprestoMe67565

55 Followers 5K Following

Flora Cardino @CardinoCardi

84 Followers 5K Following

Cecilia Donnan @CeciliaDon22588

52 Followers 5K Following

Van Sanipasi @VSanipasi87447

86 Followers 5K Following

Alys Rende @alys_ren

12 Followers 2K Following Alys ~ Join my free content👇

Marc Finzi @m_finzi

900 Followers 261 Following Postdoc at CMU in Locus Lab -- Machine Learning

Tianjian Li @tli104

137 Followers 267 Following phd student @jhuclsp, working on data engineering for language models.

Joe Rocca @rocca27

175 Followers 2K Following ml ∩ web ∩ vr, housing, social coordination, stable interfaces, alt proteins, wild animal suffering, ageing, etc

Zhuorui Ye @YeZhuorui40987

2 Followers 46 Following Undergrad student in IIIS, Tsinghua University

Daina Meisel @daina_mei

63 Followers 5K Following

Jeannette Brightman @JBrightm

26 Followers 5K Following

Livi Hamelinck @LHamelinck94261

80 Followers 5K Following

@UCSanDiego | @Hitachi_Japan | @samsungresearch | @iitbombay | AI (machine learning, robot learning & computer vision) | Retweets are not endorsements

Saqib Azim @_saqib1707

Jemma Buechner @BuechnerJe50639

19 Followers 3K Following

Renee George @7gbex9brhcwhe

7 Followers 301 Following Love investing in US stocks

Sage Knochel @SaKnoche

75 Followers 5K Following

Shona Mccausland @Mccausland87898

18 Followers 2K Following Shona / 19 / Lets Cam👇💕

Mir Arfeen Hussain @MiArfeen56

134 Followers 4K Following Just received some exciting news and I can't wait to share it with everyone! Stay tuned for updates. 🎉✨

Keshav Ramji @KeshavRamji

75 Followers 325 Following BS/MS Student @CIS_Penn and @Wharton | AI Intern @IBMResearch

Karush Suri @karush_

405 Followers 670 Following Meta Learner @Theteamatx in @GoogleAI swimmer & comic fan Past @borealisai @eceuoft

Amelia Pila @pil_ameli

10 Followers 2K Following 😻19 - Join my free content👇🍑

Hilda Lacrosse @lacros_hi

49 Followers 5K Following

Lula Mirick @LulaMiric

47 Followers 5K Following

Ariel Larocca @LaroccAri

81 Followers 5K Following

Keila Brandeland @KBrandelan30091

52 Followers 5K Following

GPT Maestro @GptMaestro

61 Followers 388 Following curator of the LLMpedia (Illustrated Large Language Model Encyclopedia)

Starr Vas @StarrVas52155

25 Followers 4K Following Starr 23 Lets Chat👇💖

Blessing Kriege @BlessiKrie

43 Followers 5K Following

WY @0324wy0324

27 Followers 440 Following Let's talk about MLSys~

Brandy Frisbie @BraFrisbi

71 Followers 5K Following

Suhas Kotha @kothasuhas

96 Followers 116 Following CS at CMU

Latisha Mcpheeters @LatishaMcp16214

67 Followers 5K Following

Nela Chalow @ne_chalow

25 Followers 4K Following Nela ~ Lets Have Fun👇

Christina Fogal @ChristinFog

29 Followers 5K Following

Mairi Beeman @BeemanMair84101

60 Followers 5K Following

Alayna Rattay @AlaynaRatt

29 Followers 4K Following 🔥23 , Lets CamChat👇🔞

Sloane Rodrigue @SloaRodrig

18 Followers 2K Following 🤤Sloane . Lets Cam👇

Zhili Feng @zhilifeng

70 Followers 304 Following ML PhD@CMU

Jeanine Schwarzenbach @jeanine_jeani

57 Followers 5K Following

Sahar Truby @tr_saha

53 Followers 5K Following

zhuai @guo0914

66 Followers 270 Following

David Yin @DavidYin0609

7 Followers 12 Following Undergraduate @UCBerkeley

Milin Bhade @MilinBhade

56 Followers 1K Following Post Grad Student at IISc, Bangalore Masters in Computer Science & Automation

Jiahao Wang @JiahaoWANG48297

0 Followers 24 Following

Sr Research Scientist @SFResearch. PhD @Stanford. Researcher on foundation models, RL/games, deep learning, uncertainty quantification, and their theory.

Yu Bai @yubai01

3K Followers 2K Following Sr Research Scientist @SFResearch. PhD @Stanford. Researcher on foundation models, RL/games, deep learning, uncertainty quantification, and their theory.

Professor @Wharton studying AI, innovation & startups. Democratizing education using tech
Book: https://t.co/CSmipbJ2jV
Substack: https://t.co/UIBhxu4bgq

Ethan Mollick @emollick

211K Followers 551 Following Professor @Wharton studying AI, innovation & startups. Democratizing education using tech Book: https://t.co/CSmipbJ2jV Substack: https://t.co/UIBhxu4bgq

Philipp Schmid @_philschmid

16K Followers 651 Following Tech Lead and LLMs at @huggingface 👨🏻‍💻 🤗 AWS ML Hero 🦸🏻 | Cloud & ML enthusiast | 📍Nuremberg | 🇩🇪 https://t.co/l1ppq3q3hk

Distinguished Research Scientist at Meta AI. t-SNE. DenseNet. Web-scale weakly supervised vision. CrypTen. Currently herding Llamas.

Laurens van der Maate.. @lvdmaaten

653 Followers 1K Following Distinguished Research Scientist at Meta AI. t-SNE. DenseNet. Web-scale weakly supervised vision. CrypTen. Currently herding Llamas.

/MachineLearning @slashML

121K Followers 1 Following

Prof. @Unige_en, Adjunct Prof. @EPFL_en, Research Fellow @idiap_ch, co-founder @nc_shape. AI and machine learning since 1994. I like reality.

François Fleuret @francoisfleuret

31K Followers 456 Following Prof. @Unige_en, Adjunct Prof. @EPFL_en, Research Fellow @idiap_ch, co-founder @nc_shape. AI and machine learning since 1994. I like reality.

#KAIST RAI Lab (ML engineering #Naver)
Interested in robotics, RL, math (but you might know me for t2i diffusion)
cloneofsimo@gmail.com

Simo Ryu @cloneofsimo

3K Followers 383 Following #KAIST RAI Lab (ML engineering #Naver) Interested in robotics, RL, math (but you might know me for t2i diffusion) [email protected]

Sam Whitmore @sjwhitmore

12K Followers 2K Following building @newcomputer. not a cat (or a man) in real life. I like to run a lot! @kensho @harvard @StuyNY

Karush Suri @karush_

405 Followers 670 Following Meta Learner @Theteamatx in @GoogleAI swimmer & comic fan Past @borealisai @eceuoft

Transhuman engineer in singularity! Lover of AI & omnidisciplionary metamathemagics! Hypercuriousia! Omniperspectivity! Shapeshifting metafluid! Freedom 4 all!

Burny — Effective O.. @burny_tech

14K Followers 6K Following Transhuman engineer in singularity! Lover of AI & omnidisciplionary metamathemagics! Hypercuriousia! Omniperspectivity! Shapeshifting metafluid! Freedom 4 all!

Microsoft Research @MSFTResearch

553K Followers 2K Following We advance science and technology to benefit humanity. https://t.co/kz0nARXbwT Register for Microsoft Research Forum on June 4 ⬇️ Get our newsletter

xAI @xai

997K Followers 36 Following

Chief Scientist, Neural Networks @Databricks via MosaicML. PhD @MIT_CSAIL. BS/MS @PrincetonCS. DC area native. Making AI efficient for everyone at @DbrxMosaicAI

Jonathan Frankle @jefrankle

16K Followers 685 Following Chief Scientist, Neural Networks @Databricks via MosaicML. PhD @MIT_CSAIL. BS/MS @PrincetonCS. DC area native. Making AI efficient for everyone at @DbrxMosaicAI

AI Breakfast @AiBreakfast

167K Followers 209 Following The latest rumors and developments in the world of artificial intelligence. DM to include your AI project in the newsletter.

Suhas Kotha @kothasuhas

96 Followers 116 Following CS at CMU

Highlighting Politicians' trades so we can invest alongside

Goal: get them banned from trading

Powered by @joinautopilot_

Nancy Pelosi Stock Tr.. @PelosiTracker_

560K Followers 223 Following Highlighting Politicians' trades so we can invest alongside Goal: get them banned from trading Powered by @joinautopilot_

Content Creator // #VALORANT Contact: valorantleaksbusiness@gmail.com Backup account: @valorantleaksv2

VALORANT Leaks & News @VALORANTLeaksEN

333K Followers 46 Following Content Creator // #VALORANT Contact: [email protected] Backup account: @valorantleaksv2

Investor with higher returns than Wall Street's Old Boys Club. Audited portfolio, automated hedge and free weekly analysis.

Beth Kindig @Beth_Kindig

130K Followers 5K Following Investor with higher returns than Wall Street's Old Boys Club. Audited portfolio, automated hedge and free weekly analysis.

Evan @StockMKTNewz

420K Followers 384 Following Free Stock Market News that is FAST, ACCURATE, CONSISTENT, and RELIABLE | Not Just Stock News | Check out my Linktree ⬇️

Andrei Bursuc @abursuc

7K Followers 1K Following Research scientist @valeoai | Teaching @Polytechnique @ENS_ULM | Alumni @upb1818 @Mines_Paris @Inria @ENS_ULM

Ece Kamar @ecekamar

4K Followers 201 Following Artificial intelligence researcher

Chuang Gan @gan_chuang

4K Followers 457 Following Faculty Member at UMass Amherst; Principal researcher at MIT-IBM Watson AI Lab; Homepage: https://t.co/oXP6pqXCpo

Zhili Feng @zhilifeng

70 Followers 304 Following ML PhD@CMU

Host of @madmoneyoncnbc and I run the CNBC Investing Club. Follow along and join my mailing list at https://t.co/MiPnDUwQ8r…

Jim Cramer @jimcramer

2.1M Followers 698 Following Host of @madmoneyoncnbc and I run the CNBC Investing Club. Follow along and join my mailing list at https://t.co/MiPnDUwQ8r…

startup investor and builder, founder @w_conviction. accelerating AI adoption, interested in progress. tech podcast: @nopriorspod

sarah guo // convicti.. @saranormous

91K Followers 3K Following startup investor and builder, founder @w_conviction. accelerating AI adoption, interested in progress. tech podcast: @nopriorspod

Stocks/Options/Crypto/Market News +Tools. Not advice

🐳 who changed 🏛️.

Get $50-$5000 to trade: https://t.co/wGf2ZdlXpw
Discord: https://t.co/0xJ9e0ZYYG
More: https://t.co/nsxZlPV0pC

unusual_whales @unusual_whales

1.7M Followers 2K Following Stocks/Options/Crypto/Market News +Tools. Not advice 🐳 who changed 🏛️. Get $50-$5000 to trade: https://t.co/wGf2ZdlXpw Discord: https://t.co/0xJ9e0ZYYG More: https://t.co/nsxZlPV0pC

Yu Bai @yubai01

3K Followers 2K Following Sr Research Scientist @SFResearch. PhD @Stanford. Researcher on foundation models, RL/games, deep learning, uncertainty quantification, and their theory.

ChatGPT @ChatGPTapp

178K Followers 129 Following friendly robot

Tong Wu @TongWu_Pton

207 Followers 981 Following Princeton; Robust ML, Domain Adaptation; https://t.co/nZOwnTVXkM

Peter Zeng @peterznlp

86 Followers 167 Following CS PhD student @ Stony Brook University

Independent Researcher, PhD in CS, Collaboration with https://t.co/tB7QL7Sw3a and https://t.co/JN95AWiNhB
Like to explain unexpected behavior of neural nets 🤯

Ekaterina Lobacheva @KateLobacheva

301 Followers 204 Following Independent Researcher, PhD in CS, Collaboration with https://t.co/tB7QL7Sw3a and https://t.co/JN95AWiNhB Like to explain unexpected behavior of neural nets 🤯

“A beacon of clarity”. Spoke at US Senate AI Oversight committee. Founder/CEO Geometric Intelligence (acq. by Uber). Rebooting AI & Taming Silicon Valley.

Gary Marcus @GaryMarcus

145K Followers 7K Following “A beacon of clarity”. Spoke at US Senate AI Oversight committee. Founder/CEO Geometric Intelligence (acq. by Uber). Rebooting AI & Taming Silicon Valley.

Anna Bair @annaebair

119 Followers 374 Following CMU PhD student in machine learning https://t.co/3vzZEGbXc4

Conference on Languag.. @COLM_conf

2K Followers 6 Following https://t.co/GhGCMEoa4A Abstract submission: March 22, 2024

Kirill Vishniakov @kirill_vish

80 Followers 367 Following AI Researcher @G42_Healthcare | prev MSc in Computer Vision at @mbzuai

trevordarrell @trevordarrell

2K Followers 127 Following EECS, BAIR, UC Berkeley. Director, BAIR Commons Program.

Micah Goldblum @micahgoldblum

5K Followers 690 Following 🤖Postdoc at NYU with @ylecun / @andrewgwils. All things machine learning🤖 🚨On the faculty job market this year!🚨

Jiawei Yang @JiaweiYang118

244 Followers 215 Following USC PhD student | Student Researcher@Google Researcher | ex-Intern @ Nvidia Research

Victor.Kai Wang @VictorKaiWang1

252 Followers 255 Following Ph.D. student at NUS, focus on data centric ai and its applications.

Jeremy Howard @jeremyphoward

222K Followers 5K Following 🇦🇺 Co-founder: @AnswerDotAI & @FastDotAI ; Hon Professor: @UQSchoolITEE ; Digital Fellow: @Stanford

PhD student @NorthwesternU | Student Researcher @MSFTResearch.
Ex-intern @MSFTReserch, ByteDance, and Tencent AI | Previously @GeorgiaTech.
LLM, RL, agent.

Shenao Zhang @ShenaoZhang

274 Followers 964 Following PhD student @NorthwesternU | Student Researcher @MSFTResearch. Ex-intern @MSFTReserch, ByteDance, and Tencent AI | Previously @GeorgiaTech. LLM, RL, agent.

Kyla Scanlon @kylascan

168K Followers 918 Following wrote a book called "in this economy?" | (almost) daily economic videos | writing, podcast, and youtube 👻

Devendra Chaplot @dchaplot

8K Followers 365 Following Building next-gen AI at @MistralAI. Past: Research Scientist at Facebook AI Research. Ph.D. @SCSatCMU, BTech @iitbombay CS.

Code, AI, and 3D printing. Opinions are my own, not my computer's...for now. Husband of @thesamnichol. Co-creator of DALL-E 2. Researcher @openai.

Alex Nichol @unixpickle

8K Followers 388 Following Code, AI, and 3D printing. Opinions are my own, not my computer's...for now. Husband of @thesamnichol. Co-creator of DALL-E 2. Researcher @openai.

Bill Peebles @billpeeb

32K Followers 286 Following sora and agi @openai

Yangsibo Huang @YangsiboHuang

1K Followers 726 Following PhD candidate @Princeton. Prev: @GoogleAI @AIatMeta.

Brooke LeBlanc @brookeleblanc

20K Followers 1K Following Founder @joinedgeapp. 3.5 years sober. 9x half 3x marathon. Training for CHI NY ‘24

Sarah Frier @sarahfrier

84K Followers 2K Following In charge of big tech coverage at Bloomberg. Author of NO FILTER: The Inside Story of Instagram, from @simonschuster.

Cian Eastwood @CianEastwood

596 Followers 610 Following Machine learning PhD student @InfAtEd and @MPI_IS

Xinlei Chen @endernewton

1K Followers 781 Following Research Scientist at FAIR

Design @elicitorg. Makes visual essays about UX, programming, and anthropology. Adores digital gardening 🌱, end-user development, and embodied cognition

Maggie Appleton @Mappletons

37K Followers 1K Following Design @elicitorg. Makes visual essays about UX, programming, and anthropology. Adores digital gardening 🌱, end-user development, and embodied cognition

Pratyush Maini @pratyushmaini

3 days ago

1 31 139 26K 86

Download Image

Azalia Mirhoseini @Azaliamirh

5 days ago

SoTA LLMs typically exhibit 99%+ non-zero activations, but it turns out that they are still intrinsically quite sparse! We introduce CATS, a simple post-training technique that achieves 50% activation sparsity for MLP layers with almost no drop in downstream evals, while…

8 57 450 62K 319

Download Image

Maksym Andriushchenko 🇺🇦 @maksym_andr

5 days ago

Super excited to share that I successfully defended my PhD thesis "Understanding Generalization and Robustness in Modern Deep Learning" today 👨‍🎓 A huge thanks to the thesis examiners @SebastienBubeck, @zicokolter, and @KrzakalaF, jury president Rachid Guerraoui, and, of course,…

61 12 428 26K 104

Download Image

Coleman Hooper @coleman_hooper1

3 months ago

What is blocking LLMs from allowing long context inputs? 🚨Introducing KVQuant which allows serving LLaMA-7B with 1M context length on a single A100! 🔥 Current largest model is Claude-2.1 which is limited to 200K tokens. What is the challenge for increasing this? Two key…

3 7 20 1K 18

Download Image

Maxime Labonne @maximelabonne

a week ago

Llama 3 is about to be released with a 8B and a 70B models. Just saw this on Replicate: replicate.com/pricing

9 38 232 34K 40

Download Image

Shunyu Yao @ShunyuYao12

a week ago

Coding is the frontier of AI. Excited to push the two frontiers of AI coding: 1. SWE(-bench/agent) 2. Olympiad programming (this tweet) Introduce USACO benchmark: * inference methods (RAG/reflect) help a bit: 9->20% * human feedback helps a lot: 0->86%! princeton-nlp.github.io/USACOBench/

8 46 287 40K 165

Download Image

Aran Komatsuzaki @arankomatsuzaki

2 weeks ago

Compression Represents Intelligence Linearly LLMs' intelligence – reflected by average benchmark scores – almost linearly correlates with their ability to compress external text corpora repo: github.com/hkust-nlp/llm-… abs: arxiv.org/abs/2404.09937

9 75 462 75K 300

Download Image

Shunyu Yao @ShunyuYao12

2 weeks ago

Attention is all we need. Memory is all we have.

0 2 31 3K 4

Andrej Karpathy @karpathy

2 weeks ago

A few new CUDA hacker friends joined the effort and now llm.c is only 2X slower than PyTorch (fp32, forward pass) compared to 4 days ago, when it was at 4.2X slower 📈 The biggest improvements were: - turn on TF32 (NVIDIA TensorFLoat-32) instead of FP32 for matmuls. This is a…

112 368 4K 1.4M 1K

Download Image

Sachin Goyal @goyalsachin007

2 weeks ago

Pratyush Maini @pratyushmaini

2 weeks ago

8 72 294 61K 175

Download Image

1 17 107 25K 87

Download Image

Jeremy Howard @jeremyphoward

2 weeks ago

@lateinteraction Yeah I've been wondering about this too. So many people totally misunderstood what LLMs can do and how they do it, which is resulting in people trying to use them for the wrong thing -- so they then end up disappointed.

4 7 125 9K 17

Omar Khattab @lateinteraction

2 weeks ago

I worry about a bubble burst once people realize that no AGI is near—no reliably generalist LLMs or “agents”. Might seem less ambitious but it's far wiser to recognize: LLMs mainly create opportunities for making *general* progress for building AIs that solve *specific* tasks.

33 56 578 73K 169

AK @_akhaliq

2 weeks ago

JetMoE Reaching Llama2 Performance with 0.1M Dollars Large Language Models (LLMs) have achieved remarkable results, but their increasing resource demand has become a major obstacle to the development of powerful and accessible super-human intelligence. This report introduces

2 15 112 12K 60

Download Image

Aran Komatsuzaki @arankomatsuzaki

2 weeks ago

Microsoft presents Rho-1: Not All Tokens Are What You Need RHO-1-1B and 7B achieves SotA results of 40.6% and 51.8% on MATH dataset, respectively — matching DeepSeekMath with only 3% of the pretraining tokens. repo: github.com/microsoft/rho abs: arxiv.org/abs/2404.07965

5 43 312 30K 169

Download Image

Zico Kolter @zicokolter

2 weeks ago

How do you balance repeat training on high quality data versus adding more low quality data to the mix? And how much do you train on each type? @pratyushmaini and @goyalsachin007 provide scaling laws for such settings. Really excited about the work!

Pratyush Maini @pratyushmaini

2 weeks ago

8 72 294 61K 175

Download Image

0 7 65 11K 31

Pratyush Maini @pratyushmaini

2 weeks ago

8 72 294 61K 175

Download Image

Aran Komatsuzaki @arankomatsuzaki

3 weeks ago

Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic Argues that data curation cannot be agnostic of the total compute that a model will be trained for repo: github.com/locuslab/scali… abs: arxiv.org/abs/2404.07177

6 90 460 49K 312

Download Image

Aran Komatsuzaki @arankomatsuzaki

3 weeks ago

Finding Visual Task Vectors Find task vectors, activations that encode task-specific information, which guide the model towards performing a task better than the original model w/o the need for input-output examples arxiv.org/abs/2404.05729

5 62 316 44K 206

Download Image

Pratyush Maini @pratyushmaini

3 weeks ago

🤯The TOFU dataset (locuslab.github.io/tofu) had 300k+ downloads last month, and is in Top 20 most downloaded datasets on @huggingface📈. This is crazy given how small the LLM unlearning community is compared to, say, LLM evals (for GSM8k). Excited to see what y'all are building!

1 16 74 14K 21

Download Image

Shunyu Yao @ShunyuYao12

3 weeks ago

Will visit @agihouse_org for the first time this Saturday and talk about SWE-agent, Agent-Computer Interface (ACI), and answer questions😃

John Yang @jyangballin

4 weeks ago

SWE-agent is our new system for autonomously solving issues in GitHub repos. It gets similar accuracy to Devin on SWE-bench, takes 93 seconds on avg + it's open source! We designed a new agent-computer interface to make it easy for GPT-4 to edit+run code github.com/princeton-nlp/…