The Winograd Schema Challenge

By Ernest Davis, Leora Morgenstern, and Charles Ortiz

Winograd Schemas

A Winograd schema is a pair of sentences that differ in only one or two words and that contain an ambiguity that is resolved in opposite ways in the two sentences and requires the use of world knowledge and reasoning for its resolution. The schema takes its name from a well-known example by Terry Winograd

The city councilmen refused the demonstrators a permit because they [feared/advocated] violence.
If the word is ``feared'', then ``they'' presumably refers to the city council; if it is ``advocated'' then ``they'' presumably refers to the demonstrators.

In his paper, ``The Winograd Schema Challenge'' Hector Levesque (2011) proposes to assemble a set of such Winograd schemas that are

The set would then be presented as a challenge for AI programs, along the lines of the Turing test. The strengths of the challenge are that it is clear-cut, in that the answer to each schema is a binary choice; vivid, in that it is obvious to non-experts that a program that fails to get the right answers clearly has serious gaps in its understanding; and difficult, in that it is far beyond the current state of the art.

The Winograd Schema Challenge is now being run roughly once a year. It is administered by The organizing committee is currently Leora Morgenstern, Ernest Davis, and Charles Ortiz. The first running of the challenge was on July 11, 2016 at IJCAI-16; the next will be at AAAI-18.

A grand prize of $25,000 will be awarded to the first contestant who achieves 90% accuracy on both rounds of the contest; smaller cash prizes will also be awarded. The contest is sponsored by Nuance.

Pronoun Disambiguation Problems

As discussed in (Morgenstern, Davis, and Ortiz, 2016), the first round of an offering of the Winograd Schema Challenge is a collection of Pronoun Disambiguation Problems (PDPs). These are texts, found in natural literary sources, containing an pronoun that requires the use of commonsense knowledge to disambiguate.

Collection of Winograd Schemas

A collection of 150 Winograd schemas has been developed and published.

For comparison, there is also a small collection of examples that, though interesting, in one way or another fail to meet the bar of WS schemas. To avoid confusion, these are placed on a separate page.

Winograd Schemas in other Languages

Translation of 12 WSs into Chinese (translated by Wei Xu).

Translations into Japanese, by Soichiro Tanaka, Rafal Rzepka, and Shiho Katajima
Translation changing English names to Japanese PDF     HTML
Translation preserving English names PDF     HTML

Translation into French, by Pascal Amsili and Olga Seminck

The 2016 running of the Winograd Schema Challenge

Summary: The Winograd Schema Challenge was run on July 11, 2016 at IJCAI-16. There were four contestants. The first round of the challenge was a collection of 60 PDP's. The highest score achieved was 58% correct, by Quan Liu, from University of Science and Technology, China.. Hence, by the rules of that challenge, no prizes were awarded, and the challenge did not proceed to the second round.

More recent results

A Simple Method for Commonsense Reasoning by Trieu H. Trinh and Quoc V. Le (2018) reports success rates of 70% on the PDP problems used for the 2016 Challenge, and 61.5% for the published collection of Winograd schemas.

A Conservative Human Baseline Estimate for GLUE: People Still (Mostly) Beat Machines by Nikita Nangia and Sam Bowman (2018) found that human subjects (workers on Mechanical Turk) achieved 95% on the collection of PDPs in GLUE, whereas the BERT system achieved 65.1% (barely better than always guessing answer #1).

Language Models are Unsupervised Multitask Learners by Alec Radford et al. (2019) reports a success rate of 70.7% on the collection of Winograd schemas.


David Bender, Establishing a Human Baseline for the Winograd Schema Challenge. MAICS 2015

Ernest Davis, Winograd Schemas and Machine Translation, arXiv:1608.01884, August 2016.

Hector Levesque, The Winograd Schema Challenge, Commonsense-2011.

Hector Levesque, Ernest Davis, and Leora Morgenstern, The Winograd Schema Challenge, KR-2012. An expanded version of the previous item.

Hector Levesque, On Our Best Behaviour, IJCAI Research Excellence Award Presentation, 2013.

Leora Morgenstern, E. Davis, and Charles Ortiz, Planning, Executing, and Evaluating the Winograd Schema Challenge, AI Magazine Spring 2016. Article in the magazine


Thanks to Hector Levesque, Gary Marcus, Ray Jackendoff, David Bender, Wei Xu, So Tanaka, and Rafal Rzepka.