New paper from my group: "Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models" arxiv.org/abs/2402.08955 Thread below 🧵 (1/6)
Martha Lewis and I investigate the generality of previously published claims that LLMs can reason by analogy as well as humans. We follow up on results from Webb et al.'s (2023) claims that GPT-3 matches or beats humans in several analogical reasoning domains. (2/6)
@MelMitchell1 really interesting that sometimes gpt-4 is worse than 3, strange
@MelMitchell1 Readers of this post may be interested in how there is a new logic of evaluation of noisy agents in unsupervised settings (where no one knows the right answers to the test) that can help keep us safer with noisy decision makers ntqr.readthedocs.org/en/latest
@MelMitchell1 Really brilliant experiment and excellent research. I was always a big fan of Melanie, loved her book and I continue to admire her research. Thank you Melani!