They do give an example of a question, in which the model chose an incorrect ans...

taneq · on Aug 17, 2020

That’s a terribly worded question anyway. Of the original answers, ‘weather’ is the least worst but it’s still vague.

mannykannot · on Aug 17, 2020

It is a well-worded question for its purpose. The whole point is that, of all the options given, only one is justifiable (and it does not require a tendentious stretch to justify it, either.) Even “light” (which was not chosen) only applies half the time, on average. This is a valid test of natural language understanding.

rvense · on Aug 17, 2020

Remember when IBM went on Jeopardy? There was a question about which Egyptian pharaoh. A human with some knowledge of history might mix up Ramses and Seti, or whatever, or just not know the answer, but know that they didn't know. Watson answered "What are trousers?" with supreme confidence.

Jeopardy is fun and games and it was great for the blooper reel, but they're trying to sell this stuff to diagnose cancer and guide police efforts. Failure modes are kind of important.