👶 BabyLM Challenge is back! Can you improve pretraining with a small data budget? BabyLMs for better LLMs & for understanding how humans learn from 100M words New: How vision affects learning Bring your own data Paper track babylm.github.io 🧵
5
46
141
30K
56
The challenge calls for diverse submissions to study pretraining with a limited number of words. We supply data, but you can now also bring\create your own. Why limited data and why at all? CfP: arxiv.org/abs/2404.06214
@babyLMchallenge Great! Looking forward to participating this year. It's cool that construction of our own datasets is allowed now.
@babyLMchallenge poke @wissam_antoun @bensagot : really tempted to participate this year