Dan Deutsch @_danieldeutsch, Twitter Profile

Dan Deutsch @_danieldeutsch

11 months ago

LLM-based metrics like GEMBA predict many ties, but the way that ties should be handled in Kendall’s tau for meta-evaluating metrics has been a longstanding issue. We propose an update to the meta-evaluation methodology to handle ties. arxiv.org/pdf/2305.14324…

3 13 61 17K 16

Download Image

Dan Deutsch @_danieldeutsch

11 months ago

First, we show that existing Kendall variants have shortcomings related to how they handle ties, and, in some cases, ties can be exploited to game the correlations. A metric could have taken advantage of this to inflate its correlations in the WMT’22 metrics shared task.

1 0 2 266 0

Download Image

Wenda Xu @WendaXu2

11 months ago

@_danieldeutsch Hi Dan, do you have code for this paper? I would like to use this meta evaluation to validate my current metrics.

1 0 0 134 0