Lucas Beyer@giffmana
Researcher (@GoogleAI Brain Team in Zürich, ex-RWTH Aachen), Gamer, Hacker, Belgian lucasb.eyer.be Zurich, Switzerland Joined December 2013-
Tweets4,769
-
Followers13.2K
-
Following319
View a Private Twitter Instagram Account

Chris Rockwell @_crockwell
12 hours agoWhat if ViTs could do classical computer vision?
Our #3DV2022 work shows with minor changes, a ViT performs similar computations to the 8-point algorithm.💡 This is a useful inductive bias for estimating camera pose!
abs: arxiv.org/abs/2208.08988
✍️ @jcjohnss and David Fouhey

Lucas Beyer @giffmana
16 hours ago@unsorsodicorda @tomgoldsteincs @francoisfleuret @TimSalimans @emiel_hoogeboom @poolio Cc @AndreasPSteiner who told me he'd give a tutorial on it, but I don't remember if it already happened and was recorded or not.

Jeremy Howard @jeremyphoward
a day agoThere's so very many things I dislike about dataclasses. This is one of them: twitter.com/dabeaz/status/…

Lucas Beyer @giffmana
a day ago@robinschmidt_ @ikostrikov Only some times 😅 I actually don't trust my "harmless refractors" anymore without verification!

Lucas Beyer @giffmana
2 days agoBy default I actually don't trust anyone's ML code. It's just very easy to make subtle mistakes which make things just slightly worse. But @ikostrikov is one of the very few exceptions I have. I'll blindly trust his implementations. It's no surprise he's part of this 👇 result! twitter.com/svlevine/statu…

Karol Hausman 💙💛 @hausman_k
3 days ago2) With PaLM-SayCan, we got new LLM capabilities for free such as: chain-of-thought prompting (arxiv.org/abs/2201.11903) or handling multilingual queries.
This is really exciting as we can attach ourselves to the progress of LLMs for the future 📈!

Daniel Vázquez @danvazh
2 weeks agoCan we have a Journal for Halfway Done, Abandoned Papers? In the abstract, you explain why you quit working on it.

Lucas Beyer @giffmana
4 days ago@ggdupont @tdietterich @francoisfleuret @percyliang That's a great question/point. I think for small scale yes, for large scale, it's not clear/settled yet! (Mostly due to lack of good self-sup at scale)

Lucas Beyer @giffmana
4 days ago@tdietterich @francoisfleuret @ggdupont @percyliang (I already predict someone trying to argue that image-text is self-supervised, not supervised)

Lucas Beyer @giffmana
4 days ago@tdietterich @francoisfleuret @ggdupont @percyliang Yes it currently still fails: on imagenet-1k there are now competitive methods. But scaling *the same* data 10x, on imagenet-21k, they still fall far behind supervised. The stated goal, training on infinite web data, is superseded by (supervised!) image-text, works much better.

Lucas Beyer @giffmana
4 days ago@LChoshen @DrorSimon @priy2201 arxiv.org/abs/1706.02677 was one of the main papers making the empirical benefit of warm-up widely known.

Lucas Beyer @giffmana
4 days ago@LChoshen Sorry, by "this" I mean the gradually smaller lr as we go away from the head.

Lucas Beyer @giffmana
4 days ago@LChoshen This is sometimes done, it's called layer-wise learning rate decay. For example BeiT and MAE do this.

Lucas Beyer @giffmana
4 days ago@LChoshen I think you typo'd: s/weight decay/learning-rate decay/ Because weight decay is yet another related thing.

Lucas Beyer @giffmana
4 days ago@francoisfleuret @ggdupont @tdietterich @percyliang Only time will tell, but currently, this strategy performs comparatively poorly.

Lucas Beyer @giffmana
4 days ago@francoisfleuret @tdietterich @percyliang That's correct, but orthogonal to the discussed point.

Lucas Beyer @giffmana
5 days ago@francoisfleuret I really like this way of getting to know a new method. Out of curiosity, what was the mistake in the previous result you posted?

Lucas Beyer @giffmana
5 days ago@ahatamiz1 @francoisfleuret Nice! Beware that the word "patience" in the title is for real: A colleague once tried the method without good results. When they left it running while going on vacation, they were amazed by the results on their return. This is not a joke :)

Lucas Beyer @giffmana
5 days ago@ahatamiz1 @francoisfleuret We started the project before we invented ViT, and didn't want to change everything midways. However, I've done some ad-hoc experiments with ViT and it works just as well. Didn't push them to their limits yet.

Lucas Beyer @giffmana
6 days ago@ahatamiz1 @francoisfleuret You may like our distillation paper then, which does exactly that: scholar.google.ch/scholar?q=dist…

Lucas Beyer @giffmana
6 days ago@francoisfleuret For humongous data, you need something that can absorb a lot of info. For now, this is parameters. Otherwise, it's our current best way of making optimization easier. Hopefully we'll find better ways eventually.

Lucas Beyer @giffmana
6 days ago@ggdupont @francoisfleuret @tdietterich @percyliang Funny you tell me that, because I have several papers doing exactly that...

Lucas Beyer @giffmana
6 days ago@francoisfleuret @tdietterich @percyliang I don't think so at all.

Sebastian Raschka @rasbt
a week ago@tdietterich @percyliang I find "ViT" pretty ok. So, instead of LLM, maybe LaT (Language Transformer) would have been a better term in hindsight. I think the L in LLM is too ambiguous. But yeah, Foundation Model is the worst and way to hype-y. Mark my words -- I won't use it in any professional articles

Linden Li @lindensli
a week agoRecently, @abhi_venigalla and I trained GPTs from scratch to see if we could train LLMs like well-resourced companies do. Here’s what I learned going from 125 million parameters to 1.3 billion. Spoiler: training costs are within reach now. And it’s about to get a lot cheaper.

ML Review @ml_review
a week ago"Stanford CS25 - Transformers United"
Seminar by @DivGarg9 @markchen90 @giffmana @adityagrover_ @barret_zoph @drew_jaegle @aidangomezzz @ch402 Geoffry Hinton
Covers Transformers in Language, Vision, Decision-Making, Audio
youtube.com/playlist?list=…

Lucas Beyer @giffmana
a week ago@peteflorence @albertzeyer Looks cool, though Kickstarter seems to be very hit or miss.