What if ViTs could do classical computer vision?
Our #3DV2022 work shows with minor changes, a ViT performs similar computations to the 8-point algorithm.💡 This is a useful inductive bias for estimating camera pose!
✍️ @jcjohnss and David Fouhey
By default I actually don't trust anyone's ML code. It's just very easy to make subtle mistakes which make things just slightly worse.
But @ikostrikov is one of the very few exceptions I have. I'll blindly trust his implementations. It's no surprise he's part of this 👇 result! twitter.com/svlevine/statu…
2) With PaLM-SayCan, we got new LLM capabilities for free such as: chain-of-thought prompting (arxiv.org/abs/2201.11903) or handling multilingual queries.
This is really exciting as we can attach ourselves to the progress of LLMs for the future 📈!
Can we have a Journal for Halfway Done, Abandoned Papers? In the abstract, you explain why you quit working on it.
@tdietterich @francoisfleuret @ggdupont @percyliang Yes it currently still fails: on imagenet-1k there are now competitive methods. But scaling *the same* data 10x, on imagenet-21k, they still fall far behind supervised.
The stated goal, training on infinite web data, is superseded by (supervised!) image-text, works much better.
@LChoshen Sorry, by "this" I mean the gradually smaller lr as we go away from the head.
@LChoshen This is sometimes done, it's called layer-wise learning rate decay. For example BeiT and MAE do this.
@LChoshen I think you typo'd: s/weight decay/learning-rate decay/
Because weight decay is yet another related thing.
@francoisfleuret I really like this way of getting to know a new method. Out of curiosity, what was the mistake in the previous result you posted?
@ahatamiz1 @francoisfleuret Nice! Beware that the word "patience" in the title is for real: A colleague once tried the method without good results. When they left it running while going on vacation, they were amazed by the results on their return. This is not a joke :)
@ahatamiz1 @francoisfleuret We started the project before we invented ViT, and didn't want to change everything midways. However, I've done some ad-hoc experiments with ViT and it works just as well. Didn't push them to their limits yet.
@francoisfleuret For humongous data, you need something that can absorb a lot of info. For now, this is parameters.
Otherwise, it's our current best way of making optimization easier. Hopefully we'll find better ways eventually.
@tdietterich @percyliang I find "ViT" pretty ok. So, instead of LLM, maybe LaT (Language Transformer) would have been a better term in hindsight. I think the L in LLM is too ambiguous.
But yeah, Foundation Model is the worst and way to hype-y. Mark my words -- I won't use it in any professional articles
Recently, @abhi_venigalla and I trained GPTs from scratch to see if we could train LLMs like well-resourced companies do.
Here’s what I learned going from 125 million parameters to 1.3 billion. Spoiler: training costs are within reach now. And it’s about to get a lot cheaper.