Emiel Hoogeboom @emiel_hoogeboom, Twitter Profile

Emiel Hoogeboom @emiel_hoogeboom

2 years ago

Generate data in random order? A masked language models as generative model? All this and more in "Autoregressive Diffusion Models" with @agritsenko @BastingsJasmijn @poolio @vdbergrianne @TimSalimans. For details see arxiv.org/abs/2110.02037. Some explanations below...

2 23 102 0 29

Download Gif

Emiel Hoogeboom @emiel_hoogeboom

2 years ago

Training of Autoregressive Diffusion Models (ARDMs). In the basic form, they are trained like a masked language model, but where the _number_ of masked variables also varies. Each train step some variables are masked, and those are predicted from the remaining ones.

1 0 3 0 0

Download Image

Emiel Hoogeboom @emiel_hoogeboom

2 years ago

Sampling takes multiple steps. First a random generation order is picked. Then, one-by-one the model does a forward pass and samples a value. Those values are filled-in for the next forward pass.

2 0 3 0 0

Download Image

Emiel Hoogeboom @emiel_hoogeboom

2 years ago

Extensions include parallelization (using multiple predictions at once) and generation in stages (predicting more significant bits first).

1 0 3 0 0

Download Image

Emiel Hoogeboom @emiel_hoogeboom

2 years ago

ARDMs are conceptually very interesting as well. Their basic form lies at the intersection of "order agnostic ARMs" and "absorbing discrete diffusion". In fact, the _continuous-time limit_ of absorbing diffusion turns out to be an order agnostic ARM.

1 0 2 0 0

Emiel Hoogeboom @emiel_hoogeboom

2 years ago

When sampling with random orders (for which it's trained), the model works well as seen above. An interesting failure case is when we force a very specific order, like a raster-scan. Technically one of the D! permutations, but very atypical -> the model is very confused😂

1 0 1 0 0

Download Gif

Emiel Hoogeboom @emiel_hoogeboom

2 years ago

A cool application is lossless compression in only a limited number of encoding/decoding forward passes (due to the parallelisation). Compared to (continuous) latent variable models, compression "per image" is a lot better.

1 0 2 0 0

Download Image

Emiel Hoogeboom @emiel_hoogeboom

2 years ago

Very interested if this would work well in further downstream extensions and applications like impainting, de-corruption, compression.

1 0 1 0 0

Emiel Hoogeboom @emiel_hoogeboom

2 years ago

Also, for a very nice explanation of the method see this video by @ykilcher youtube.com/watch?v=2h4tRs…

0 0 4 0 1