How should the AI research community broadly refer to the family of large, pre-trained, self-supervised, scalable models applicable to a fairly broad range of downstream tasks without fine-tuning? E.g., GPT-3, PaLM, Chinchilla, OPT, DALL-E, Flamingo, Gato, etc.

I propose that we adopt the term "Large Self-Supervised Models (LSSMs)" as a replacement for "Foundation Models" and "LLMs". "LLMs" don't capture non-linguistic data and "Foundation Models" is too grandiose. Thoughts? @percyliang

@tdietterich @percyliang Ok but LLMs can still be used when they are trained on linguistic data. I would just eliminate the wording “Foundation Models”.

@tdietterich @percyliang I find "ViT" pretty ok. So, instead of LLM, maybe LaT (Language Transformer) would have been a better term in hindsight. I think the L in LLM is too ambiguous. But yeah, Foundation Model is the worst and way to hype-y. Mark my words -- I won't use it in any professional articles

@tdietterich @percyliang For LSSM: it's not bad, but why not just saying "Transformer model" to refer to models with a transformer-like architecture and training procedure? I don't think it's necessary to introduce a new term.

@frossi_t @percyliang Agreed. I wanted a term that could encompass models trained on multiple modalities: language + video + physical manipulation, etc.

@tdietterich @percyliang To me 'foundation' is really about use, and LSSM is about technical attributes. Foundation models are centralized and clients adapt them to solve specific tasks, through prompting, fine tuning, or whatever else. Such models could be LSSMs but could also be built in other ways.

@joshua_saxe @percyliang Yes. But my guess is that self-supervision is the only way to scale to these immense data sets, so I think it is a safe term going forward.

@tdietterich The beauty of language is that you can have multiple terms that highlight different aspects of the same object. You don't have to choose. I use "LLM" to talk about LLMs, "self-supervised" for their construction, and "foundation model" for their function. No term can be replaced.

@percyliang Yes, but as you know, "Foundation" is too close to "Foundational", and many of us find that troubling. That is why I'm proposing a more neutral term. For use, maybe we could just call them "Upstream models".

@tdietterich @percyliang There was a previous discussion in the comment thread below. I personally think "Large Pre-Trained Models" (LPTMs) might be the most neutral label (given the significance of pre-training & the controversy around the definition of self-supervised learning)…