I worry about language models being trained on test sets. Recently, we emailed [email protected] to opt out of having our (test) data be used to improve models. This isn't enough though: others running evals could still inadvertently contribute those test sets to training.
A better solution would to have all the LM providers agree on a common repository of examples that should be excluded from any training run.
But this might not be enough either: if we want to measure cross-task generalization, we have to ensure that no examples of a task/domain are represented in the training data. This is essentially impossible.
@percyliang Maybe some variations on robots tags? E.g., <meta name="robots" content="notraining">?
@percyliang Besides The Dark Side of Language observing the problem, we do have another simpler solution... under peer review! Hopefully, we can discuss it publicly soon. If you want, I'm here to describe it
@percyliang Isn't a license enought for that? wonder if there are @creativecommons licenses that can stop that... or if any other @OpenSourceOrg license stops that.... or legally we have been surpassed by current speed of crawling/gathering data?