People will soon realize that automated prompt engineering is akin to latent variable inference. Then they will realize that latent variable inference is akin to reasoning/planning. Then they will realize that transformers were missing that piece all along.
28
126
1K
0
292
@ylecun Alternatively, it is a hyper-network, since the latent representations of the prompt act as data-dependent weight matrices. An alternative view is that the prompt just selects an expert from a very large mixture of experts. (again the token embeddings act as expert parameters).
@ChrSzegedy @ylecun Is it still end to end?
@MarisaVeryMoe @ylecun Training-wise? Most training methodologies are not E2E, but under certain circumstances it is possible.