@eerac @FractalFlows I have these go/no-go meetings but somehow the answer is always “maybe go let’s revisit this tomorrow”
Me telling my students that all paper drafts MUST be done one week before the conference deadline
@Dirque_L Yep, that’s the only rigorous definition I can think of 🤷♂️
@bkadams In other words no duckface selfies.
@bkadams I’m thinking of people who take photographs of use diffusion *thoughtfully* and with the purpose of creating what they perceive as art.
@Dirque_L My experience is they work well on “nice” matrices and always fail on randomly generated marrices. But there’s a messy middle where they don’t reliably succeed or reliably fail. Fortunately my matrices are usually nice 😬
Are diffusion users artists? What about photographers?
@YiMaTweets True and yet industry labs seem to be testing the limit as resources approach ∞.
Queen Elizabeth reigned over both Alan Turing and the GPU.
@cmssastry Haven’t tested this! That’s a great idea though.
@ArlieColes @FreeComet22 A standard diffusion adds Gaussian noise to an image, and trains a denoising model to restore noisy images. In the animorph example we add an animal image instead of Gaussian noise, and the model is trained to restore images that have had animals added to them.
@ArlieColes @FreeComet22 The demo I show does not have a prompt. It’s an unconditional (ie prompt free) generator that produces random images. Because it’s trained on Celeb-A, the random images look like celebrity headshots.
@gdsotnikov Wow that's awesome! If I were to nitpick - the trees still look a bit blocky 😬
@roydanroy I want something where adjacent/similar frames in the video have similar results. In the Minecraft demo you can see that moving forward a tiny bit can make a sign to appear/disappear or a mountain to change to a tree.
@takes_mediocre @roydanroy Stable diffusion has a parameter called "strength" the controls how random the output is. You can set it to zero if you want no randomness - maybe this was already done in the MineCraft demo above.
Mathematically speaking, #StableDiffusion is quite unstable. This becomes obvious when it’s applied to video. I’d love to see a version where small changes in the input frame don’t lead to large changes in the output. twitter.com/matthen2/statu…
@prAggressive @17facet Interestingly all the problems in our “thinking” paper are solved with the same architecture, but the training time depends on dataset size. In general nobody knows how to choose the best model, but “model selection” and “neural architecture search” have tried to automate this.
@aertherks @KyleVedder Haven’t thoroughly looked at mode convergence. Some of our “diffusions” do image restoration and not from-scratch generation so the question doesn’t apply. When we do do generation, our FID is higher than standard diff, likely because cold diff has few sources of randomness.
@thegautamkamath People who think <25% of papers are worth accepting are both right and wrong. They are correct that <<25% provide value to *them* but they lack the imagination and open mindedness to see that >>25% provide value to someone.
@johnbender That’s essentially the Langevin diffusion interpretation. Alternating between adding noise and denoising created a markov chain that explores the distribution.
One of the hardest open problems in ML is how to handle the following sampling bias: we only see the most successful projects, the papers that reviewed well, and the hyper-params that worked. But it seldom feels like our own research ever hits that stride. twitter.com/andrewwhite01/…
@stillnotelf This is the same feedback we got from reviewer 2.
@AlexTensor Yeah and I think that’s why results are so blurry on the first stage. In some sense you’re getting a superposition of many possible faces. This superposition results in a low loss because of Jensen’s inequality and the convex loss function.
@NerdyBeaverDev Yep. Generation starts from random. Generated images are likely to be unique, but it’s not clear how much DALLE “borrows” from training data by using objects and backgrounds that resemble training data.
@ani578 We don't have the compute resources needed to compete with these SOTA models right now. The big models train for a very long time on huge datasets. This is especially true of the late diffusion models - to do a fair comparison we'd need industry level resources.