I am absolutely begging AI researchers to learn the data processing inequality.
I am absolutely begging AI researchers to learn the data processing inequality.
Discovering new knowledge from synthetic data generated from existing knowledge is the information equivalent of a perpetual motion machine.
@pfau Worked wonders for humans, we learned from the knowledge our ancestors built on top the knowledge of theirs
@pfau Agree for standard ML, but I think LLMs are weird. I think you could improve short/immediate answer ability by training on synthetic examples of final answers arrived at by chain of thought. Same goes for other kinds of consistency checks. Wouldn't be naive tuning on stale data.
@pfau I think you just described Maths as a perpetual motion machine.
@pfau That isn't quite right. You lose information in a formal sense at each stage of processing but that doesn't mean much wrt to what you can learn. Most of the information in high-d data is irrelevant anyway.
@pfau Maybe if you’re talking about language models. For visual models, you’re 100% wrong, as the synthetic data generated by 3D models is genuinely new.