Lucas Beyer@giffmana
Researcher (@GoogleAI Brain Team in Zürich, ex-RWTH Aachen), Gamer, Hacker, Belgian. Trying mastodon as [email protected] lucasb.eyer.be Zürich, Suisse Joined December 2013-
Tweets5,591
-
Followers19.4K
-
Following327
View a Private Twitter Instagram Account

Lucas Beyer @giffmana
2 hours agoIf you don't know at least one of: tiling, row/col major, or cache-lines and why they are relevant to deep learning, highly recommend reading this. In general I recommend following Horace! twitter.com/cHHillee/statu…

Lucas Beyer @giffmana
2 hours ago@cHHillee @ID_AA_Carmack +1 very well chosen example. I think matmul is more commonly used because cs.utexas.edu/~flame/pubs/Go… but your A+B.T keeps the essence while removing the extras, I really like it and will definitely steal it in the future :)

Lucas Beyer @giffmana
2 hours ago@ShayneRedford @YiTayML I see, now it makes more sense, my instant reaction was "wut", as I just had my vision dataset work in mind, which takes me at the very least an hour per dataset!

Lucas Beyer @giffmana
3 hours ago@YiTayML @ShayneRedford How on earth do you create 1800+ tasks or unify this many datasets to a common format?

Lucas Beyer @giffmana
3 hours ago@francoisfleuret @hos31n It seems to be mostly those that are present many times (duplicates) or outliers, both of which we know from recent papers that (not generative) deep models so memorize. While this result was not obvious a priori, it's also not hugely surprising, but super important nonetheless!

Lucas Beyer @giffmana
2 days ago@wightmanr @rasbt @PyTorch Especially that cifar is not indicative of what's good for "regular resolution" images at all.

Lucas Beyer @giffmana
2 days ago@ylecun @francoisfleuret This is not even an arbitrary exemple: when regressing angles and putting mod 360, something like this realistically happens.

Dan Roy @roydanroy
3 days agoBefore you go work with some professor, ask yourself: do they just promote themself? or do they also work for their students? Looking at someone's website is a pretty good tell.

Lucas Beyer @giffmana
3 days ago@david_picard I think it's less about native or not, and more that some people just can't help being verbose and others are naturally concise, in their mothertongue too.

Lucas Beyer @giffmana
3 days ago@david_picard @ChrisGr93091552 The second, I've seen it a lot lately.

Lucas Beyer @giffmana
3 days ago@peratham I like this one more because unlike convnext, it also mentions all the things that did *not* work, which is very valuable info. (I also like convnext)

Lucas Beyer @giffmana
3 days ago@karpathy I like this paper because we had a similar experience between BiT and ViT, trying A LOT of ResNet "improvements" and none held water (except SqeezeEx). Now a similar thing seems to be happening with ViT sadly.

Lucas Beyer @giffmana
4 days ago@theshawwn @francoisfleuret mobilenet, effnet, convnext, nfnet, etc

Lucas Beyer @giffmana
4 days ago@y0b1byte Wow yes. And consistently 3 per year. I'm very impressed.

Lucas Beyer @giffmana
4 days ago@cHHillee @theshawwn @francoisfleuret Yeah that should be exposed as low level thing then (à la jax.lax) and not guide high level API design imo.

Lucas Beyer @giffmana
4 days ago@m__dehghani @neu_rips Yeah I did come up with a few heuristics (models) before asking, but though "this sounds like it must already exist". Seems like nope so far though.

Lucas Beyer @giffmana
4 days ago@advadnoun @zacharynado Don't worry, that's why I shitpost without developing agi in bigco, will even thing out

Lucas Beyer @giffmana
4 days ago@theshawwn @francoisfleuret That's the thing, having one somehow "freezes" this exact combination or at least conv-normalizer-activation and disincentives trying other stuff. At least that's what I noticed happens.

Lucas Beyer @giffmana
4 days ago@theshawwn @francoisfleuret I'm instantly put off when a library or framework considers the non linearity to be part (say an option) of another layer

Lucas Beyer @giffmana
4 days ago@francoisfleuret XLA has exactly one such mega-general conv op! jax.readthedocs.io/en/latest/_aut…

Lucas Beyer @giffmana
4 days ago@yuricampbll @neu_rips Yep that's one of the heuristics someone suggested and give reasonably useful results for me

Lucas Beyer @giffmana
5 days ago@DigThatData @roydanroy @neu_rips Joan had a slightly better version of such heuristic, on top of which I built my current best heuristic: twitter.com/giffmana/statu… I can do heuristics but was really curious if something "proper" or established already exists.

Lucas Beyer @giffmana
5 days ago@advadnoun phew dodged a bullet by putting in a bit of effort :D

Lucas Beyer @giffmana
5 days ago@DigThatData @roydanroy @neu_rips Yup, that's a good approach, but linear wouldn't work, it's a set-based input and definitely a non-linear function that I want. Overall, making this almost a whole project on its own - can't afford that time for what I'm doing with it, and will prefer a closed-form heuristic.
