Percy Liang @percyliang, Twitter Profile

Percy Liang @percyliang

a year ago

I worry about language models being trained on test sets. Recently, we emailed [email protected] to opt out of having our (test) data be used to improve models. This isn't enough though: others running evals could still inadvertently contribute those test sets to training.

39 111 1K 291K 159

Percy Liang @percyliang

a year ago

A better solution would to have all the LM providers agree on a common repository of examples that should be excluded from any training run.

5 3 135 21K 1

Percy Liang @percyliang

a year ago

But this might not be enough either: if we want to measure cross-task generalization, we have to ensure that no examples of a task/domain are represented in the training data. This is essentially impossible.

9 6 116 19K 5

Soren Larson @srnlrsn

a year ago

@percyliang Narrator- this is not enough.

1 0 0 148 0

Mike Ma (AI x Finance) @MMikeMMa

a year ago

@percyliang cc @GaryMarcus

0 0 0 228 0

Paul @PaulPromptChain

a year ago

@percyliang I’m worried about AI auto generating the data that it then consumes to know what’s correct. That’s a recipe for a feedback loop of incorrect. We need a mechanism of recognising when something on the internet is incorrect / potentially untrue / potentially true / correct.

1 0 3 157 0

keval shah @keval_sha

a year ago

@percyliang Can you elaborate on or provide an example for what you mean by cross-task generalization?

0 0 0 129 0

Haritz Puerto @HaritzPuerto

a year ago

@percyliang @harish @rachneet4

1 0 1 180 0

Anthony Bell @bellar3464

a year ago

@percyliang Test set measures generalization but these models are already showing impressive generalization abilities. Instead I’d use benchmarks that are hard for models to do instead of trying to hide some test data to “discover the true performance”

0 0 0 35 0

Pēteris Paikens @PPaikens

a year ago

@percyliang Isn't it unrealistic to assume that no examples of a domain would be represented in the unsupervised pre-training data? You need to exclude it from finetuning, but starting with an "all-domain" LLM should give a honest estimate of real-world generalization to an arbitrary domain.

0 0 1 34 0

Yaman Kumar Singla @YamanKSingla

a year ago

@percyliang With the release of OpenAI APIs on Azure, I think, the spec mentions that one can keep their enterprise data local on Azure instance and hence not allow GPT to be trained on it. Can't this functionality be used for test sets as well?

0 0 0 226 0