We need a MKBHD for AI software. Because there’s tons of bullshit out there to wade through
Working on something like this right now for AI benchmarks on models. It's very tricky though for reasons similar to MKBHD, but maybe even more challenging. I have found many "anomalies" to various claims, many centering around benchmark performances. While it's easy to attribute these to malice, it's more likely the case that people are actually just trying to genuinely make models that also do well on benchmark style data. At this point, you're probably even making a mistake if you don't train your model to answer Q&A facts and multiple choice questions. Obviously if they trained directly on known benchmark data, that's more likely cheating (but not necessarily), but it's very hard to determine what's cheating, what's market pressures being met, what's incompetence, and what's just by chance...etc.
@HamelHusain Dunno about AI but @laurenbalik may be the MKBHD of the modern data stack 😜
@HamelHusain Trust me then you would need a MKBHD to vet those MKBHDs. Coz opinion in AI is in abundance. And many would vouch for themselves🤣
@HamelHusain Paging @jxnlco ready to begin your influencer arc?