“The fact that [transformer neural nets] model language is probably one of the biggest discoveries in history. That you can learn language by just predicting the next word with a Markov chain—that’s just shocking to me,” Mikhail Belkin says. By @strwbilly. technologyreview.com/2024/03/04/108…
With accumulating evidence, aspects of this result are still shocking and others are not so shocking. Since transformers have proven so adept at building representations that suck associative order from many natural signals, it’s maybe not surprising that they work for language.
@chrmanning @strwbilly Where does the Markov chain come from? It depends on all previous not just the immediate
@chrmanning @strwbilly I was an editor of a music magazine that took reviews from first-time writers. I met competent writers back then who literally relied on memorisation to decide if a sentence "read right" with no real comprehension of the rules of grammar or the meanings of some words used.
@chrmanning @strwbilly Well, you can approximate how bits of language relate to one another but occasionally make a bunch of bizarre errors. Learning a language model is not the same as learning a mapping between sentences and semantics. So I find the quote to be an oversimplification that misleads.
@chrmanning @strwbilly That's what happened to hominids turning into cortex guys...
@chrmanning @strwbilly Same shock when I was writing this: fsndzomga.medium.com/there-will-be-…
@chrmanning @strwbilly You don't need to understand, you just need to look it. Or sound it.
@chrmanning @strwbilly You can also learn Python (apparently) just by predicting the next token. Is that equally shocking? Yet what does it tell us about Python? Absolutely nothing. What does it tell us about language? Absolutely nothing.