Similar trick comes up when implementing backprop through discrete stochastic variables return loss + (grad_obj - grad_obj.detach()) Returns the forward loss, but also attaches the grad_obj node to the its Tensor so loss.backward() pushes the grad through this extra path
Similar trick comes up when implementing backprop through discrete stochastic variables return loss + (grad_obj - grad_obj.detach()) Returns the forward loss, but also attaches the grad_obj node to the its Tensor so loss.backward() pushes the grad through this extra path https://t.co/ue4WFbJd1e
@darkproger Can gumbel softmax trick be used in this case?