Search results for giffmana

@wightmanr On the other hand, I feel like we also haven't explored layerwise enough on our side yet, so I wouldn't reach any definite conclusion either way yet.

@giffmana I have not been able to achieve comparable results fine-tuning CLIP image tower or even in22k supervised -> 1k weights without using it. The 1k val is higher and the OOD test set scores are better, ie more robust. I've done some hparam search, maybe not exhaustive enough?

@wightmanr It's all obvious, come on, just look at the code ;-) (/s)

@wightmanr yeah I hear you. big_vision is criminally under-documented, internally too... So many cool things to be done!

@giffmana Will also getting lots of help from lots of help from other 🤗 folk for the doc push! My doc skills are still suspect, I keep getting distracted by shiny models...

@wightmanr PS: nice to see your documentation skills slowly improving (or just you spending slightly more time on it) after joining🤗😁

@wightmanr Ha, that's pretty much the same way we implemented this in big_vision! Though we haven't really found layerwise lr decay to be very useful yet even though it seems popular recently.

Setting/intuition: Multilinguality. No direct translations or paired languages needed in LiT-tuning data. The pre-trained image model serves as a bridge across languages, anchoring concepts. Single text model sees all languages 👉Similar to how multilingual kids learn @giffmana

It's not the end of #ImageNet. Labels have shortcomings, but distillation (through consistent, patient teaching) yields improvements. Paper: arxiv.org/abs/2106.05237 Key observations by @giffmana at the #NittyGritty public seminar at #AlephAlpha. Thanks for the insights! 🤓

#Distillation: Show student and teacher the same images = recipe for accuracy. Consistent input is the only setting that works. Heavy data augmentations help. Patience: part of the recipe. "No overfitting with 1k images. Actually, we underfit" @giffmana #NittyGritty #AlephAlpha

@OfirPress @zhang_muru @sewon__min @lschmidt3 @nlpnoah @ml_perception @uwnlp @MosaicML @MetaAI @allen_ai Glad you mention this, because that's exactly what I was going to ask. Nice work!

@giffmana It’s not fun. I checked my passport yesterday.

@CSProfKGD This actually happened to me once, and I was (rightfully) denied boarding at the airport. So yeah, good reminder!

@JonasAndrulis @giffmana I must reconsider mine. An inspiring speaker for the #NittyGritty series, congrats @Robert_Baldock🌱

If you are not planning to ask @giffmana questions about his amazing work tomorrow you should reconsider your life choices. twitter.com/Robert_Baldock…

🎉 In tomorrow's #NittyGritty seminar, Lucas Beyer @giffmana from #GoogleBrain will talk about learning and transferring general visual representations. 🚀 Join us! 🔗 Event link: lnkd.in/e-VYqW7K #machinelearning #ml #ComputerVision #openscience #writtenbyalephalpha

@marktenenholtz @giffmana @wightmanr @ducha_aiki Or back in grad school when everyone was trying to configure Spark when you could just run it on a different machine using pandas 😆

In need of a smooth multi-topics presentation of transformers, check this by @giffmana, Google Zurich: twitter.com/giffmana/statu…

This is going to be a great (and open) seminar! Looking forward to hearing about this exciting work from @giffmana and his collaborators in Google Brain. twitter.com/Robert_Baldock…

I am thrilled to welcome Lucas Beyer, @giffmana, for our next Online #nittygritty Seminar on October 5th at @Aleph__Alpha. Come join us and hear about Lucas' important work on “Learning and Transferring General Visual Representations”. #OpenScience #ML #ComputerVision #NLProc

lol the foresight on this reply. 20/20 vision. twitter.com/josh_ross/stat…

@wightmanr @giffmana @ducha_aiki It's useful to have it as an option, but not as the only option.

@marktenenholtz @wightmanr @ducha_aiki Yes! And the older we get, the more we value our time ;-)

@wightmanr @michalwols @ducha_aiki @rom1504 @giffmana @ApacheArrow You're absolutely right for parquet. This is exactly one of the problems that Lance was designed to solve. Parquet was built for purely large scans on structured data.

@wightmanr @michalwols @ducha_aiki @rom1504 @giffmana @ApacheArrow In prod environments this can get messy. I've seen a lot of "yeah we made parquet and point to images, then someone moved the s3 dir and everything got borked" Instead if your columnar format can support fast point-access, you can have a single source of truth of both purposes.

@michalwols @wightmanr @ducha_aiki @rom1504 @giffmana @ApacheArrow The flip-side of the cloud-storage thing is that having a single source of truth in cloud storage makes it a lot easier for MLOps

@michalwols @wightmanr @ducha_aiki @rom1504 @giffmana @ApacheArrow You pretty much nailed it. Being Arrow compatible makes it easy to analyze cv datasets (e.g., ensure training data has right distribution). You can also consolidate rich metadata with training data and supports really fast scans over cloud storage.

@michalwols @giffmana @ducha_aiki @rom1504 @ApacheArrow You can still address that by sharding with oversampling (increase in data for cheap storage still much less $ than random access). Filtering down is easy but limits to that. You can also get creative and tier data into different sets of shards, adjust mix on read. Not OOB though

@giffmana @wightmanr @ducha_aiki Every time I try to write some extra fancy streaming dataloader or optimized disk loading beyond the standard stuff, I realize it'd probably be cheaper to run fewer experiments on a bigger machine 😂

@ericjang11 @michalwols @wightmanr @ducha_aiki @rom1504 @ApacheArrow because pure text is rather small, you can cover a huge range of pages/examples in very little space, so these things are simpler. But I can't answer your question exactly, as I never implemented one.

@michalwols @wightmanr @ducha_aiki @rom1504 @ApacheArrow They are likely to fully fit in RAM or reasonably cheap SSD though. If it would not fit, it's probably large enough that imbalance doesn't matter.

@michalwols @giffmana @wightmanr @ducha_aiki @rom1504 @ApacheArrow how do LLM data loaders handle sliding windows of blocks of data? i assume here you'd want the data in sequential storage, but without a shuffle buffer, that breaks the pseudorandomness. shuffle between worker shards reading sequentially?

@giffmana @wightmanr @ducha_aiki @rom1504 @ApacheArrow That really depends on what you're doing. For datasets with large class imbalance and metric learning it's hard to get away with pure sequential scans.

@wightmanr @michalwols @ducha_aiki @rom1504 @ApacheArrow Exactly, and with the move to ever fewer epochs over ever larger datasets, there's less and less need for efficient random indexing _for training_ (but yes need for pseudorandom ordering)