Mathieu Blondel @mblondel_ml, Twitter Profile

Mathieu Blondel @mblondel_ml

2 years ago

Nice paper providing a way to compute stochastic gradients using only forward-mode autodiff (w/o backprop) thanks to perturbations (with unit mean and variance). This is especially useful for large architectures, as memory can be released along the way. arxiv.org/abs/2202.08587

8 50 288 0 79

Mathieu Blondel @mblondel_ml

2 years ago

Combined with usual stochastic gradients, this gives doubly stochastic gradients (where stochasticity comes from both perturbations and training points).

1 0 6 0 1

Mathieu Blondel @mblondel_ml

2 years ago

In fact, Jacobian-vector products / directional derivatives used by their technique can be computed using finite difference in just two calls to the function, so forward-mode autodiff could even be avoided...

2 2 9 0 3

Download Image

Emiel Hoogeboom @emiel_hoogeboom

2 years ago

@mblondel_ml This is super cool. Perhaps, finite difference directional gradients could be better approximations than pseudo gradients (for instance in difficult to differentiate forward functions like argsort/rounding). Any idea about issues with "almost all random vectors are orthogonal"?

0 0 2 0 1