Malte Ostendorff @XYOU
co-founder @openlegaldata, senior researcher @dfki, postdoc @uniGoettingen ostendorff.org Berlin, Germany Joined June 2009-
Tweets752
-
Followers781
-
Following2K
-
Likes2K
Seeing such a post by a @huggingface employee, makes me wonder whether a "delete" on the hub actually removes all files or just hides them. 🤔
Seeing such a post by a @huggingface employee, makes me wonder whether a "delete" on the hub actually removes all files or just hides them. 🤔
😊 Happy to be part of this! Next models are already cooking 🌋🌋🌋
😊 Happy to be part of this! Next models are already cooking 🌋🌋🌋
LEdits++ is now implemented in 🧨diffusers. 🎉 Just download diffusers from source: pip install git+github.com/huggingface/di… And you'll be able to use LEditsPPPipelineStableDiffusion and LEditsPPPipelineStableDiffusionXL. Documentation lives here: moon-ci-docs.huggingface.co/docs/diffusers…
LEdits++ is now implemented in 🧨diffusers. 🎉 Just download diffusers from source: pip install git+github.com/huggingface/di… And you'll be able to use LEditsPPPipelineStableDiffusion and LEditsPPPipelineStableDiffusionXL. Documentation lives here: moon-ci-docs.huggingface.co/docs/diffusers… https://t.co/7Ayk6KkSk2
You're calling this wild life. For the most of us, this is just the norm. 🐒
😱 If you are still looking for a reason for truely open AI, here is yet another one...
😱 If you are still looking for a reason for truely open AI, here is yet another one...
🔥🔥🔥
The Gemini report says very little about the technical details but there are a few hints that it is not one but multiple specialized models (similar to GPT4 if the rumors are correct).
#ossym23 #flashback #researchtracks Malte Ostendorff of @DFKI presented “Europe’s Technical Debt: Why We Need Web Search in the Age of Generative AI” (+ Pedro Ortiz Suarez @pjox13 Julian Moreno-Schneider @GeorgRehm of @DFKI) #OpenSearchSymposium #OpenWebSearch @OpenWebSearchEU
Wir brauchen Ihre Hilfe, openJur wird verklagt! Keine Zukunft für frei dokumentierte Rechtsprechung? openjur.de/i/openjur_wird…
This is another one of those ill-thought, fear-mongering scientific disinformation about LLMs, and I will explain why in this long thread. 🧶
This is another one of those ill-thought, fear-mongering scientific disinformation about LLMs, and I will explain why in this long thread. 🧶
I will present our work about LM transfer learning at the PML4DC workshop co-located with #ICLR2023 - Friday at 14:45h in MH1 pml4dc.github.io/iclr2023/progr…
I will present our work about LM transfer learning at the PML4DC workshop co-located with #ICLR2023 - Friday at 14:45h in MH1 pml4dc.github.io/iclr2023/progr…
What's the best tool to extract information from scientific publications? @MeuschkeN did the test 👇👇👇
What's the best tool to extract information from scientific publications? @MeuschkeN did the test 👇👇👇
Jan Philip Wahle @jpwahle
629 Followers 223 Following I love data and language. Follow me for updates on my research.NLLP Workshop @NllpWorkshop
1K Followers 556 Following Workshop on Natural Legal Language Processing (NLLP) at EMNLP 2023 v1: NAACL 2019, v2: KDD 2020, v3: EMNLP 2021 v4: EMNLP 2022 #nlproc #law #legaltech #nllpGavindya @Gavindya2
283 Followers 294 Following Ph.D. Student, Department of @ODUCS @ODU, @WebSciDL @NirdsLab | Summer Research Intern @LosAlamosNatLabMichael L. Nelson @phonedude_mln
2K Followers 976 Following Professor: @WebSciDL, @ODUcs, @ODUVMASC (2002-now); Engineer: @NASA_Langley (1991-2002); Postdoc: @UNCSILS (2000-2001)Isabelle Augenstein @IAugenstein
11K Followers 1K Following Full Professor @CopeNLU @uni_copenhagen Formerly @ucl_nlp, @SheffieldNLP. Explainable AI, Natural Language Processing, ML.Gabriele Sarti @gsarti_
2K Followers 2K Following PhD Student @GroNLP 🐮, core dev of @InseqLib (https://t.co/tTjrg26ygQ). Interpretability ∩ HCI ∩ #NLProc. Prev: @AmazonScience, @Aindo_AI, @ItaliaNLP_Lab.Travis Reid @TReid803
215 Followers 398 Following PhD Student at @oducs; Member of @WebSciDL; M.S. & B.S. in Computer Science from @ODU; A.S. in Computer Science from @TCCvaPei Zhou @peizNLP
2K Followers 888 Following PhD @nlp_usc | Ex-@GoogleDeepMind, @GoogleAI, @allen_ai @AmazonScience @UCLA | Common Ground Reasoning for Communicative Agents | he/himYasith Jayawardana @yasithdev
362 Followers 434 Following ⏳PhD (CS) @ODU @NirdsLab @WebSciDL Past: Research Fellow @Harvard Center for Advanced Imaging, Research Intern @LosAlamosNatLab, BS (CSE) @MoratuwaUniHimarsha R. Jayanetti @HimarshaJ
480 Followers 863 Following PhD Candidate (Computer Science @oducs) Web Science & Digital Libraries Research Group @WebSciDL, ODU @odu.Kritika garg @kritika_garg
351 Followers 819 Following Ph.D. Student, Department of Computer Science @ODUCS @ODU, Member of Web Science & Digital Libraries Research Group @WebSciDL.Toreslore @ToreslorekhG
0 Followers 49 FollowingMia Aguilar @mia_aguila94725
86 Followers 3K FollowingMelinda Ramsay @MelindaRam49511
71 Followers 5K FollowingViva Angelovich @angelovich46453
82 Followers 5K FollowingDaisy-mae Armenteros @mae_arment88918
70 Followers 5K FollowingStephaine Sherrer @SherrStephai
11 Followers 3K FollowingBriella Bragdon @briel_bragd
80 Followers 5K FollowingHalina Mittchell @HalMittch
54 Followers 5K FollowingQian Ruan @QianRuan_
8 Followers 17 Following Hi, my name is Qian Ruan. I am a doctoral researcher at the Ubiquitous Knowledge Processing Lab (UKP) at Technical University of Darmstadt.Gilberte Jenkins @GilbeJenki
84 Followers 5K FollowingAbby Malecha @AbbyMalech39213
64 Followers 5K FollowingUgui Monge @UguiMonge
36 Followers 219 FollowingPrudence @sunumam09977569
4 Followers 789 FollowingAubrie Woolridge @AubrieW57576
74 Followers 5K FollowingSimon Batzner @simonbatzner
4K Followers 708 Following RS at Google DeepMind. Prev: PhD at Harvard, MIT, NASA, Google Brain.QS @SunQumeng
11 Followers 54 Following machine learning enjoyer. fps gamer. s11-12 s14-15 s17, s19 master in @PlayApexAbhay Gupta @gupta__abhay
163 Followers 1K Following ML @CerebrasSystems | previously @CMU_Robotics | music, soccer & travel.Balthas (@balthas@fos.. @b_seibold
470 Followers 743 Following On #knowledgecommons, #opensource, #ArtificialIntelligence for all. Own thoughts, not of @giz_gmbh /BMZ. Co-lead @fair_forward w/ @gimpelle / RT≠endorsementFlorian Buettner @BuettnerFlo
439 Followers 977 Following Scientist. Professor at DKTK/DKFZ and Frankfurt university, machine learning in cancer research.Lylah Jelome @LylahJ82701
26 Followers 5K FollowingCharley Baranick @CharlBaran
65 Followers 5K FollowingKaya Swatman @ka_swatm
62 Followers 5K FollowingShirly Sayler @shir_sayl
74 Followers 5K FollowingDennis @dennizor
818 Followers 1K Following Tool maker. Interface conjurer. Attractor jumper. Boson cutter. Prev. co-founded @joinknackBram @BramVanroy
1K Followers 713 Following @ku_leuven @ccl_kuleuven: Creative #NLG 🖋️ @ivdnt: Dutch #NLProc and #LLMs 🤖 Organizing @ctt2024 🖋️ Fellow at @huggingface 🤗 Prev. @lt3ugent, @SignONTaylor Mcindoe @TaylMcindo
60 Followers 5K FollowingManuel Brack @MBrack_AIML
450 Followers 188 Following Research Scientist @dfki | PhD Candidate @TUDarmstadt | Co-Founder @occiglotFabio Barth @FabioBarch
31 Followers 373 FollowingTuesday @Madeau22613
17 Followers 346 Following A fashion enthusiast and investor in the US stock market. I alternate my time between fashion design and analyzing financial market performance.Indira Sen @indiiigosky
712 Followers 882 Following Computational Social Science PhD Student at RWTH - She/her - Saving up to fund my own biopic and migrating to https://t.co/tQ6ujjzKH8 + https://t.co/jflkRWngV3Poaja @Poaja234425
101 Followers 4K FollowingAlexander Seifert @therealaseifert
182 Followers 873 Following I’m a product-minded NLP engineer with a background in computer science, linguistics & philosophy. 🎯 https://t.co/83m4fKcTOm · 📝 https://t.co/xf7B97oTlS · 📚 https://t.co/ODf0KrRkJ8Khalid خالد @eKhalid_
35 Followers 242 FollowingRaya (Raia) @raya_aa1
114 Followers 313 Following tweets about linguistics, NLP, AI, Palestine, and cats (not necessarily in that order)Fetheighh @fetheighh53389
52 Followers 1K Following I live alone now and enjoy business, traveling, shopping, food and music. I have a calm personality and I hope we can be friends.DoreenJackson @acj4pwa2ektR50
54 Followers 204 FollowingOpheliaViolet @Y4pbh3HSGn52L
4 Followers 192 Following I come from a game development company we have the latest RPG and SLG mobile games.🕹️ Play Add👇👇👇 🎮customer service🎮 whatsApp: +86 15268152379Lawrence Bird @la_frenze
11 Followers 88 FollowingMcTayroo @MTayroo63653
94 Followers 2K FollowingFabio Massimo Zanzott.. @znz8
533 Followers 250 Following Associate Professor @unitorvergata, working on #MachineLearning for #NLProc and #PrecisionMedicine. Passionated for #AIEthics. Opinions are my own.Maria Carmen Staiano @mc_staiano
312 Followers 529 Following PhD student at @UniMC | @NlpUnior research group | Ex-Apple #machinetranslation #computationallinguistics #LLMsYann LeCun @ylecun
712K Followers 719 Following Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.Andrej Karpathy @karpathy
980K Followers 905 Following 🧑🍳. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets 🧠🤖💥(((ل()(ل() 'yoav))).. @yoavgo
46K Followers 2K FollowingNils Reimers @Nils_Reimers
10K Followers 434 Following Director of Machine Learning @Cohere | ex-huggingface | Creator of SBERT (https://t.co/MKKOMfuQ4C)AK @_akhaliq
310K Followers 3K Following AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo follow on Hugging Face: https://t.co/q2Qoey80GxEMNLP 2024 @emnlpmeeting
12K Followers 41 Following EMNLP 2024 - The 2024 Conference on Empirical Methods in Natural Language Processing, November 12 –16, 2024 Hashtag: #EMNLP2024Sebastian Ruder @seb_ruder
80K Followers 1K Following Multilingual LLMs @cohere • Prev: @GoogleDeepMind • Newsletter: https://t.co/7JGh2qpG98Thomas Wolf @Thom_Wolf
68K Followers 4K Following Co-founder and CSO @HuggingFace - open-source and open-scienceSasha Rush @srush_nlp
52K Followers 464 Following Professor, Programmer in NYC. Cornell Tech, Hugging Face 🤗 https://t.co/cZl0wTfqGzDelip Rao e/σ @deliprao
46K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈Sam Bowman @sleepinyourhat
35K Followers 3K Following AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.Jan Philip Wahle @jpwahle
629 Followers 223 Following I love data and language. Follow me for updates on my research.William Wang @WilliamWangNLP
14K Followers 719 Following UCSB NLP Lab + ML Center. https://t.co/6TOnqbk6YT https://t.co/KJYhnav3Et Mellichamp Chair Prof. at UCSB CS. PhD @ CMU SCS. Areas: #NLProc, Machine Learning, AI.NLLP Workshop @NllpWorkshop
1K Followers 556 Following Workshop on Natural Legal Language Processing (NLLP) at EMNLP 2023 v1: NAACL 2019, v2: KDD 2020, v3: EMNLP 2021 v4: EMNLP 2022 #nlproc #law #legaltech #nllpGavindya @Gavindya2
283 Followers 294 Following Ph.D. Student, Department of @ODUCS @ODU, @WebSciDL @NirdsLab | Summer Research Intern @LosAlamosNatLabSebastian Gehrmann @sebgehr
5K Followers 2K Following Head of NLP, CTO office, @Bloomberg. (he/him) Generating natural language, one word at a time. Also making sense of that language afterwards. views my ownLuca Soldaini 🎀 @soldni
6K Followers 1K Following I like tokens! Lead for OLMo data team at @allen_ai (Dolma 🍇), OSS is fun, @QueerInAI organizer 🤖☕️🍕they/them (views mine, not my employer’s)Michael L. Nelson @phonedude_mln
2K Followers 976 Following Professor: @WebSciDL, @ODUcs, @ODUVMASC (2002-now); Engineer: @NASA_Langley (1991-2002); Postdoc: @UNCSILS (2000-2001)Tim Dettmers @Tim_Dettmers
29K Followers 823 Following PhD Student at @UW. I blog about deep learning and PhD life at https://t.co/Y78KDJJFE7.elvis @omarsar0
189K Followers 486 Following Building with LLMs @dair_ai • Prev: Meta AI, Galactica LLM, PapersWithCode, Elastic, PhD • Creator of the Prompting Guide (~4M learners)Kriegsforscher @OSINTua
97K Followers 214 Following Private entrepreneur before the full-scale invasion. Right now serve in the Ukrainian Marine Corps (air recon unit). Support our unit via PP: [email protected]Daniel Huynh @dhuynh95
898 Followers 224 Following CEO at @MithrilSecurity, AI & privacy startup. Lead of 🌊LaVague, open-source Large Action Model project https://t.co/D4n9bzUjncDan Saattrup Nielsen @saattrupdan
16 Followers 78 Following Senior AI Specialist at the Alexandra InstituteAaron Defazio @aaron_defazio
6K Followers 366 Following Research Scientist at Meta working on optimization. Fundamental AI Research (FAIR) teamDeutsche Telekom @deutschetelekom
84K Followers 643 Following Hier postet das #CorporateCommunications Team 👉 https://t.co/nc60KWegQm | Für Kundenanliegen und Servicefälle: @Telekom_hilft 💡Claudia Nemat @claudianemat
4K Followers 154 Following Board Member #Technology and #Innovation @deutschetelekom / @telekom_group.Zengyi Qin @qinzytech
1K Followers 178 Following MIT PhD @MIT | Co-founded @myshell_ai | ex. @Stanford @MSFTResearch | Let's do AGI!polars data @DataPolars
5K Followers 7 Following Dataframes powered by a multithreaded, vectorized query engine, written in RustManuel Brack @MBrack_AIML
450 Followers 188 Following Research Scientist @dfki | PhD Candidate @TUDarmstadt | Co-Founder @occiglotStephen Bach @stevebach
2K Followers 422 Following Asst. prof. @BrownCSDept. Working on improving how humans teach computers. Weak supervision, zero-shot learning, few-shot learning, and high-level knowledge.Indira Sen @indiiigosky
712 Followers 882 Following Computational Social Science PhD Student at RWTH - She/her - Saving up to fund my own biopic and migrating to https://t.co/tQ6ujjzKH8 + https://t.co/jflkRWngV3Raya (Raia) @raya_aa1
114 Followers 313 Following tweets about linguistics, NLP, AI, Palestine, and cats (not necessarily in that order)Dan Fu @realDanFu
4K Followers 176 Following CS PhD Candidate at Stanford, systems for machine learning. Sometimes YouTuber/podcaster. Academic Partner, @togethercompute.main @main_horse
8K Followers 478 Following AGI Believer. Haven't applied @OpenAI. Likes are not always endorsement.Wing Lian (caseus) @winglian
9K Followers 2K Following @axolotl_ai OSS maintainer. Axolotl AI founder. AI/ML tinkerer. Building tools for everyone.Philipp Singer @ph_singer
12K Followers 464 Following Senior Principal Data Scientist @h2oai | PhD in CS Top ranked Kaggle Grandmaster (Highest #1) All views are my own. https://t.co/NHdaca2ld0Nous Research @NousResearch
18K Followers 29 Following The AI Accelerator Company. https://t.co/vrD0aDJetoWolfram Ravenwolf @WolframRvnwlf
1K Followers 201 Following 🐺🐦⬛ AI aficionado, local LLM enthusiast, llama liberator, Amy AGI creator (WIP 😎) | Full-time Professional AI Engineer + Part-time Freelance AI ConsultantHyperspace @HyperspaceAI
14K Followers 24 Following Organizing the world’s AI agents. Think Mixture of 1000s of experts, not just a few.Simon Ramstedt @simonramstedt
514 Followers 696 Following AI Research. Previously @mila_quebec, @mcgillu, ElementAI, @msftresearch and @ias_tudarmstadt.SebastianBoo @SebastianB929
132 Followers 138 FollowingDevendra Chaplot @dchaplot
8K Followers 365 Following Building next-gen AI at @MistralAI. Past: Research Scientist at Facebook AI Research. Ph.D. @SCSatCMU, BTech @iitbombay CS.Arthur Mensch @arthurmensch
40K Followers 874 Following Co-founder and CEO @MistralAI. Apply https://t.co/yHGRZAtjcxWeyaxi @Weyaxi
2K Followers 2K Followinghessian.AI @Hessian_AI
2K Followers 309 Following Driving research excellence, education, practice and leadership in AI to foster economic growth and improve the human condition.Orochimaru's Demeanou.. @theYorubayesian
374 Followers 554 Following I have been blessed with a wilder mind. Guided. Gifted.Jan P. Harries @jphme
855 Followers 283 Following Co-Founder & CEO @ ellamind / #DiscoResearch / Retweets&favs are stuff i find interesting, not endorsementsBjörn Plüster @bjoern_pl
386 Followers 56 Following Founder and CTO of ellamind. LLM and open-source enthusiast. @ellamindAI, @DiscoResearchAIKristian Kersting @kerstingAIML
5K Followers 2K Following #AI prof @TUDarmstadt, co-director @Hessian_AI, @DFKI, @RealAAAI Councilor, @vision_claire, @ELLISforEurope, AI Columnist @WELTAMSONNTAGTeknium (e/λ) @Teknium1
29K Followers 3K Following Cofounder @NousResearch, prev @StabilityAI Github: https://t.co/LZwHTUFwPq HuggingFace: https://t.co/sN2FFU8PVE Support me on Github SponsorsSebastian Nagel @sebnagel
123 Followers 144 Following crawl engineer @CommonCrawl, committer @ApacheNutch, member @TheASF, computational linguist, programmer @EXCInequality @unikonstanzEneko Agirre @eagirre
1K Followers 281 Following Prof. @upvehu Head of HiTZ research center @hitz_zentroa, member of @ixaGroup, Spanish Informatics Research Prize 2021. ACL fellow @aclmeetingHamish Ivison @hamishivi
476 Followers 598 Following Antipodean Abroad. he/him. I (try to) do NLP research. PhD student @uwcse, prev @Sydney_Uni @allen_ai 🇦🇺🇨🇦🇬🇧Tri Dao @tri_dao
19K Followers 365 Following Incoming Asst. Prof @PrincetonCS, Chief Scientist @togethercompute. Machine learning & systems.Juriskop @juriskop
376 Followers 44 Following Legal-Tech-Forschungsprojekt der Kanzlei Jun Rechtsanwälte in Kooperation mit der Universität Würzburg.Matthias Grabmair @matgrabmair
348 Followers 265 Following Assistant Professor, Informatics @TU_Muenchen; Former @LTIatCMU and SINC. AI&Law, Legal Tech, Legal NLP; opinions my own; he/himSang Michael Xie @sangmichaelxie
3K Followers 709 Following PhD student @StanfordAILab @StanfordNLP @Stanford advised by Percy Liang and Tengyu Ma. Prev: visiting @GoogleAI Brain, BS, MS Stanford ‘17Desmond Elliott @delliott
3K Followers 440 Following Assistant Professor at the University of Copenhagen working on multimodal machine learning.Qdrant @qdrant_engine
7K Followers 71 Following Open-Source Vector Search Engine and Vector Database written in Rust https://t.co/nkLUmsgxhV 🦀 Also available in the cloud https://t.co/wBitjnGWgi ⛅Sepp Hochreiter @HochreiterSepp
10K Followers 395 Following Pioneer of Deep Learning and known for vanishing gradient and the LSTM. I mostly tweet about random ArXiv papers which sparked my interest.Aleksa Gordić 🍿�.. @gordic_aleksa
19K Followers 217 Following https://t.co/mcuQvV8wEa proud father of 16 A100s & 16 H100s flirting with LLMs, tensor core maximalist x @GoogleDeepMind @MicrosoftTime for memes with Schmidhuber?
MLPs are so foundational, but are there alternatives? MLPs place activation functions on neurons, but can we instead place (learnable) activation functions on weights? Yes, we KAN! We propose Kolmogorov-Arnold Networks (KAN), which are more accurate and interpretable than MLPs.🧵
Quantization is quite harmful for LLaMA 3 than for LLaMA 2. This PR in llama cpp repo investigates it well. (Perplexity measures how well the model can predict the next token with lower values being better.) Most probable reason - lama 3 was trained for 15T tokens (biggest of…
Llama 3 degrades more than Llama 2 when quantized. Probably because Llama 3, trained on a record 15T tokens, captures extremely nuanced data relationships, utilizing even the minutest decimals in BF16 precision fully. Making it more sensitive to quantization degradation.…
Just stopped @AnthropicAI's Claudebot from completely ddos'ing a site with a large phpbb forum. Seriously uncool to be that aggressive with data-collection for your LLM.
The LLM Engine, an open-source platform from @scale_AI for LLM serving in production looks pretty interesting. Efficient auto-scaling, Squeezing as many queries per second (QPS) as possible out of your GPU, host OSS models on our own infrastructure to eliminate any privacy…
@winglian Here is @winglian's adapter: huggingface.co/winglian/llama… You can apply this adapter to any llama-3-8b finetune, to give it 256k context.
Another class I'm teaching this semester is "Programming w/ LLMs". This sidesteps the whole chatbot / assistant / "an AI" theme and looks at LLMs as function approximators -- where, weirdly, the function needs to be "found" first. (Yes, DSPy will feature heavily.)
Apple presents OpenELM An Efficient Language Model Family with Open-source Training and Inference Framework The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and
Got some Llama 3 8B fine-tuning experiments rolling in our eval harness. Let's see how this baby does after we've cooked it a bit. 😎
@econ_lessmann Hm, abgesehen von der Qualität des Essens - was spräche denn dagegen, als Prof zwei Studierende "querzusubventionieren"?
mEdIT: Multilingual Text Editing via Instruction Tuning arxiv.org/abs/2402.16472… Excited to share the multilingual extension to CoEdIT, to appear at @naaclmeeting 2024! - Multilingual and cross-lingual revision - Works across seven languages - Generalizability to newer languages
CoEdIT: Text Editing by Task-Specific Instruction Tuning - Instructions + Task-Specific fine-tuning - Outperforms GPT3/InstructGPT/PEER on Text Editing benchmarks - Generalizability to newer & composite instructions Paper: arxiv.org/abs/2305.09857 Models: huggingface.co/grammarly
@gneubig yes that's the goal, we're in the process of organizing all the artifacts and it should come out in the coming days with a more detailed tech report/blog
We're hiring a data engineer to help with Dolma! Come work with me on: - data acquisition 🕷️ - high–performance data pipelines ⚡️ - open source science 🔄 Please apply if you have experience in any of those! DM for questions, job link in thread 🧵
When @ericries and I created Answer.AI, we started our research thesis from 2 trends: - A dramatic increase in model size - Much larger opportunities for finetuning using “continued pre-training” I discussed this on @latentspacepod: latent.space/p/fastai
We've written up our methods, results, and provided background on the critical underlying research ideas such as DoRA, in a blog post, along with open source code to allow you to replicate and build on our results: answer.ai/posts/2024-04-…
We have just released 🍷 FineWeb: 15 trillion tokens of high quality web data. We filtered and deduplicated all CommonCrawl between 2013 and 2024. Models trained on FineWeb outperform RefinedWeb, C4, DolmaV1.6, The Pile and SlimPajama!
We trained 200+ ablation models to validate our processing decisions, and we share all the code you need to reproduce our setup, along with our dataset comparison ablation models checkpoints! Find out all abut 🍷 FineWeb on the 🤗 model page: huggingface.co/datasets/Huggi…
The best few-shot classifier is... 🥁 Llama? Mistral? Flan? 🌟How about Roberta!🌟 In our new @naaclmeeting paper, we claim we were just missing the right objective!
Forget the US Executive Order — 9e24 FLOPs means Llama 3-70B is on the verge of crossing the EU threshold for a systemic risk model (1e25). That would mean mandatory pre-release & post-release requirements that may be very difficult to reconcile with open release. Not only that,…
The model card has some more interesting info too: github.com/meta-llama/lla… Note that Llama 3 8B is actually somewhere in the territory of Llama 2 70B, depending on where you look. This might seem confusing at first but note that the former was trained for 15T tokens, while the…
@YiTayML @yuzhaouoe Yeah! However, other (open-source) pre-training code bases we analysed were doing plain causal masking, where the likelihood of each token was conditioned on all previous tokens in the pre-training chunk. This (sub-optimal, as we found in arxiv.org/abs/2402.13991) choice probably…
Llama 3 was trained using intra-document causal masking, as suggested by @yuzhaouoe's paper "Analysing The Impact of Sequence Composition on Language Model Pre-Training"! 🚀🚀🚀 arxiv.org/abs/2402.13991