While instruction tuning is clearly necessary for producing usable interfaces like ChatGPT, the "magic" of language models comes from self-supervised learning on broad data, which enables emergent behavior like in-context learning and chain-of-thought.
@percyliang Bingo. You still need a great base model to start with.
@percyliang 💯 this behavior comes built in with gpt-3, pretty amazing imo
@percyliang 💯 this behavior comes built in with gpt-3, pretty amazing imo
@percyliang Tishby (2017), unravels deep learning. The recognition of essentials is a bottleneck and the most important part of learning is forgetting. "Forgetting," is the neutral emotion, which does not have with recurring meanings.
@percyliang Exactly. I think training Codex was a breakthrough moment. It forced the model to build internal hidden models that help it process causality, logic and chain reasoning. This is the way.
@percyliang Train models on math, logic, code, first principles philosophy and physics and it will better "interpret" new training data from fiction, opinion, dialog, news etc.
@percyliang Agree. Though such emergent behaviors as in-context learning and CoT reasoning abilities are not fully understood yet, DNN-based language models are basically instilled with broad data (human knowledge/experience); the "magic" is their abilities to retrieve them efficiently.