By Ernest Davis, Leora Morgenstern, and Charles Ortiz
A Winograd schema is a pair of sentences that differ in only one or two words and that contain an ambiguity that is resolved in opposite ways in the two sentences and requires the use of world knowledge and reasoning for its resolution. The schema takes its name from a well-known example by Terry Winograd
The city councilmen refused the demonstrators a permit because they [feared/advocated] violence.If the word is ``feared'', then ``they'' presumably refers to the city council; if it is ``advocated'' then ``they'' presumably refers to the demonstrators.
In his paper, ``The Winograd Schema Challenge'' Hector Levesque (2011) proposes to assemble a set of such Winograd schemas that are
The set would then be presented as a challenge for AI programs, along the lines of the Turing test. The strengths of the challenge are that it is clear-cut, in that the answer to each schema is a binary choice; vivid, in that it is obvious to non-experts that a program that fails to get the right answers clearly has serious gaps in its understanding; and difficult, in that it is far beyond the current state of the art.
A contest, entitled the Winograd Schema Challenge was run once, in 2016. At that time, there was a cash prize offered for achieving human-level performance in the contest. Since then, the sponsor has withdrawn; therefore NO CASH PRIZES CAN BE OFFERED OR WILL BE AWARDED FOR ANY KIND OF PERFORMANCE OR ACHIEVEMENT ON THIS CHALLENGE.
For comparison, there is also a small collection of examples that, though interesting, in one way or another fail to meet the bar of WS schemas. To avoid confusion, these are placed on a separate page.
Translation of 12 WSs into Chinese (translated by Wei Xu).
Translations into Japanese, by
Soichiro Tanaka, Rafal Rzepka, and Shiho Katajima
Translation changing English names to Japanese PDF     HTML
Translation preserving English names PDF     HTML
Translation into French, by Pascal Amsili and Olga Seminck
Winograd Schemas in Portuguese by Gabriela Melo, Vinicius Imaizumi, and Fábio Cozman.
A Conservative Human Baseline Estimate for GLUE: People Still (Mostly) Beat Machines by Nikita Nangia and Sam Bowman (2018) found that human subjects (workers on Mechanical Turk) achieved 95% on the collection of PDPs in GLUE, whereas the BERT system achieved 65.1% (barely better than always guessing answer #1).
Language Models are Unsupervised Multitask Learners by Alec Radford et al. (2019) reports a success rate of 70.7% on the collection of Winograd schemas.
Exploring Unsupervised Pretraining and Sentence Structure Modeling for Winograd Schema Challenge by Yu-Ping Ruan, Xiaodan Zhu, Zhen-Hua Ling, Quan Liu, and Si Wei, repors a 71.1% success rate on the collection of Winograd schemas.
A Surprisingly Robust Trick for Winograd Schema Challenge by Vid Kocijan, Ana-Maria Cretu, Oana-Maria Camburu, Yordan Yordanov, Thomas Lukasiewicz (2019) reports a success rate of 72.2 on the collection of Winograd schemas and of 71.9 on the GLUE collection of PDPs.
On the Evaluation of Common-Sense Reasoning in Natural Language Understanding by Paul Trichelair et al. (2018) analyzes the comparative difficulties of different schemas.
David Bender, Establishing a Human Baseline for the Winograd Schema Challenge. MAICS 2015
Ernest Davis, Winograd Schemas and Machine Translation, arXiv:1608.01884, August 2016.
Hector Levesque, The Winograd Schema Challenge, Commonsense-2011.
Hector Levesque, Ernest Davis, and Leora Morgenstern, The Winograd Schema Challenge, KR-2012. An expanded version of the previous item.
Hector Levesque, On Our Best Behaviour, IJCAI Research Excellence Award Presentation, 2013.
Leora Morgenstern, E. Davis, and Charles Ortiz, Planning, Executing, and Evaluating the Winograd Schema Challenge, AI Magazine Spring 2016. Article in the magazine