By Ernest Davis, Leora Morgenstern, and Charles Ortiz
A Winograd schema is a pair of sentences that differ in only one or two words and that contain an ambiguity that is resolved in opposite ways in the two sentences and requires the use of world knowledge and reasoning for its resolution. The schema takes its name from a well-known example by Terry Winograd
The city councilmen refused the demonstrators a permit because they [feared/advocated] violence.If the word is ``feared'', then ``they'' presumably refers to the city council; if it is ``advocated'' then ``they'' presumably refers to the demonstrators.
In his paper, ``The Winograd Schema Challenge'' Hector Levesque (2011) proposes to assemble a set of such Winograd schemas that are
The set would then be presented as a challenge for AI programs, along the lines of the Turing test. The strengths of the challenge are that it is clear-cut, in that the answer to each schema is a binary choice; vivid, in that it is obvious to non-experts that a program that fails to get the right answers clearly has serious gaps in its understanding; and difficult, in that it is far beyond the current state of the art.
The Winograd Schema Challenge is now being run roughly once a year. It is administered by commonsensereasoning.org. The organizing committee is currently Leora Morgenstern, Ernest Davis, and Charles Ortiz. The first running of the challenge was on July 11, 2016 at IJCAI-16; the next will be at AAAI-18.
A grand prize of $25,000 will be awarded to the first contestant who achieves 90% accuracy on both rounds of the contest; smaller cash prizes will also be awarded. The contest is sponsored by Nuance.
For comparison, there is also a small collection of examples that, though interesting, in one way or another fail to meet the bar of WS schemas. To avoid confusion, these are placed on a separate page.
Translation of 12 WSs into Chinese (translated by Wei Xu).
Translations into Japanese, by
Soichiro Tanaka, Rafal Rzepka, and Shiho Katajima
Translation changing English names to Japanese PDF     HTML
Translation preserving English names PDF     HTML
Translation into French, by Pascal Amsili and Olga Seminck
A Conservative Human Baseline Estimate for GLUE: People Still (Mostly) Beat Machines by Nikita Nangia and Sam Bowman (2018) found that human subjects (workers on Mechanical Turk) achieved 95% on the collection of PDPs in GLUE, whereas the BERT system achieved 65.1% (barely better than always guessing answer #1).
Language Models are Unsupervised Multitask Learners by Alec Radford et al. (2019) reports a success rate of 70.7% on the collection of Winograd schemas.
David Bender, Establishing a Human Baseline for the Winograd Schema Challenge. MAICS 2015
Ernest Davis, Winograd Schemas and Machine Translation, arXiv:1608.01884, August 2016.
Hector Levesque, The Winograd Schema Challenge, Commonsense-2011.
Hector Levesque, Ernest Davis, and Leora Morgenstern, The Winograd Schema Challenge, KR-2012. An expanded version of the previous item.
Hector Levesque, On Our Best Behaviour, IJCAI Research Excellence Award Presentation, 2013.
Leora Morgenstern, E. Davis, and Charles Ortiz, Planning, Executing, and Evaluating the Winograd Schema Challenge, AI Magazine Spring 2016. Article in the magazine