François Fleuret @francoisfleuret, Twitter Profile

François Fleuret @francoisfleuret

2 years ago

Unfortunately, I fear I'll always be cheap regarding model size. Instinctively I still am "*millions of parameters?!?!*"

2 0 17 0 0

Lucas Beyer (bl16) @giffmana

2 years ago

@francoisfleuret For humongous data, you need something that can absorb a lot of info. For now, this is parameters. Otherwise, it's our current best way of making optimization easier. Hopefully we'll find better ways eventually.

2 1 17 0 1

Ali Hatamizadeh @ahatamiz1

2 years ago

@giffmana @francoisfleuret Parameter-efficient learning is often taken for granted. I wish we go beyond the "fitting" paradigm and learn more with less.

1 0 2 0 0

Lucas Beyer (bl16) @giffmana

2 years ago

@ahatamiz1 @francoisfleuret You may like our distillation paper then, which does exactly that: scholar.google.ch/scholar?q=dist…

1 0 3 0 2

Ali Hatamizadeh @ahatamiz1

2 years ago

@giffmana @francoisfleuret This is great ! seems like FunMatch is the key. I'd be curious to try this approach, but wondering if you have any experiments with ViT ? paper shows an incredibly high performance of 82.8% top1 for ResNet50.. so ViT shall even get better..

1 0 1 0 0

Lucas Beyer (bl16) @giffmana

2 years ago

@ahatamiz1 @francoisfleuret We started the project before we invented ViT, and didn't want to change everything midways. However, I've done some ad-hoc experiments with ViT and it works just as well. Didn't push them to their limits yet.

1 0 1 0 0

Ali Hatamizadeh @ahatamiz1

2 years ago

@giffmana @francoisfleuret This is amazing. I was going to employ KL for our GC ViT. But this now gives me all the motives to jumpstart the effort and try it. Thanks for the pointer to this great work.

1 0 1 0 0