antonio vergari 💥 hiring PhD students & postdoc💥 @tetraduzione, Twitter Profile

antonio vergari 💥 hiring PhD students & postdoc💥 @tetraduzione

2 years ago

One trick that I like to use when training my neural networks is to add some noise ε~Laplace(time(), sqrt(time())) to the gradients of the 13th layer at epoch 3 for batch 7.

32 68 1K 0 115

Rich Morris @sometimesdata.bsky.social @sometimes_data

2 years ago

@tetraduzione why not just use random.seed(19385724)?

1 0 1 0 0

Min Htoo @linminhtoo

2 years ago

@tetraduzione I also found that starting my training at the strike of the hour helps the training dynamics align with the natural fabric of the universe and converge better

2 9 146 0 1

Amir-Hossein Karimi @amirhkarimi_

2 years ago

@tetraduzione Tried this; only works on GPU # mod 3 = 1; why?

1 0 20 0 0

Theofanis Karaletsos @Tkaraletsos

2 years ago

@tetraduzione I found it works much better to add a normalizing flow bijector to shape your 1d noise, I got better results the one time I did it, so it must be true.

1 0 5 0 0

Loren Lugosch @lorenlugosch

2 years ago

@tetraduzione Careful: this trick does not play nicely with batch norm

1 0 31 0 0