Winograd Schemas and Machine Translation: Some Examples
In (Davis, 2016) I discussed this use of Winograd schemas as challenges for
machine translation, an idea that dates back at least to Terry Winograd's
doctoral thesis (1970). Below, I report on the results testing 37 such
examples on
Google Translate (GT) and
DeepL as of January 2020.
As proposed in (Davis, 2016),
the examples below are modified from the online collection of Winograd schemas,
rephrasing them so that a translator is require to choose between
il (French masculine singular) and elle as translations
for it;
ils (masc. pl.) and elles (fem.
pl.) as translation for they .
Confronted with subtle choices of this kind, machine translation systems tend
to be very sensitive to small changes in wording that seem inconsequential
to a human understanding. I have therefore often
included two versions of
what is essentially the same schema. Unlike my
Collection of Machine Translation Fails,
I have here recorded all the
results on all the examples that I attempted, in which the translations
created by the programs included a gendered prounon, both those in which the
translation programs succeeded and those in which it failed. I have omitted
some in which the translation avoided the use of a pronoun, or where the
gender of the pronoun was indeterminate.
Tally
The examples below include 37 pairs corresponding to 23 different Winograd
schemas. Of those 37 pairs:
Google Translate
There are 3 pairs (#15.B, #16, #19) where GT gets both sentences right.
There is 1 pair (#4.A) where GT gets one sentence right, and correctly
translates the other sentence but without using a pronoun.
In the remaining 33 pairs, GT uses the same pronoun for both sentence (right
for one, wrong for the other).
DeepL
There are 4 pairs (#4.A, #11.B, #16, #23)
where DeepL gets both sentences right.
There is 1 pair (#8) where DeepL gets both sentences wrong.
In the remaining 32 pairs, DeepL uses the same pronoun for both sentence (right
for one, wrong for the other).
Bottom line: As of January 2020, Winograd Schemas are very hard for machine
translation programs, but the programs are beginning to make
some inroads on them.
Examples
-
The city councilmen refused the women demonstrators a permit because they
feared/advocated violence.
Google Translate and DeepL both use "elles" in both: right for "advocated",
wrong with "feared".
- The trophy does not fit in the suitcase because it is too small/large.
Both Google and DeepL both use "le trophée" and "la valise" and both use "il"
for both sentences: right for "large" and wrong for "small".
-
A. Joan and Susan made sure to thank Jim and Mark for all the help they had
given/received.
GT and DeepL both use "ils" for both: right with "given", wrong with
"received".
B. The same happens if you switch "Jim and Mark" with "Joan and
Susan".
-
A. Joan and Susan tried to call Jim and Mark on the phone, but they weren't
successful/available.
GT correctly uses "ils" for "available" and avoids the issue for
"successful":
Joan et Susan ont essayé d'appeler
Jim et Mark au téléphone, mais sans succès.
DeepL correctly uses "ils" for "available" and "elles" for "successful".
B. However, if you switch "Jim and Mark" with "Joan and Susan" then both
GT and DeepL
use "ils" for both: right for "sucecssful", wrong for
"available".
-
A. The stag raced past the lioness because it was going so fast/slow.
Google Translate avoids the issue in both sentences:
Le cerf a couru devant la lionne parce que ça allait si vite/lentement.
DeepL uses "il" in both: right for "fast", wrong for "slow".
B. The lioness could not catch up with the stag, because it was going too
slow/fast.
GT and DeepL both use "il" for both: right for "fast", wrong for "slow".
-
A. Frank and Bill felt [vindicated/crushed] when their longtime rivals
Joan and Susan revealed that they were the winners of the competition.
GT uses "ils" for both: Right for "vindicated", wrong for "crushed".
DeepL uses "elles" for both: right for "crushed", wrong for "vindicated".
B. If you reverse "Frank and Bill" with "Joan and Susan", GT still uses "ils" for
both, and DeepL now uses "ils" for both.
-
A. The fathers couldn't lift their daughters because they were too weak/heavy.
GT and DeepL uses "elles" for both: right for "heavy", wrong for "weak".
B. If you change "fathers" to "mothers" and "daughters" to "sons", then both
programs use "ils" for both.
-
The hammer crashed through the table because it was made of styrofoam.
GT uses "le marteau" for "hammer" and "la table" for "table" and uses
"il" for both sentences: right for "steel", wrong for "styrofoam".
DeepL uses the same nouns, and uses "il" with "styrofoam" and "elle" with
"steel"; wrong in both cases.
-
A. Jim and Mark couldn't see the stage with Susan and Joan sitting in front of
them, because they are so short/tall.
GT uses "ils" for both; right for "short", wrong for "tall".
DeepL uses "elles" for both; right for "tall", wrong for "short".
B. If you switch "Jim and Mark" with "Susan and Joan" then both programs use
"ils" for both.
-
The vase rolled off the shelf because it wasn't anchored/level.
Both programs use "le vase" for "vase" and "étagère" (fem.) for shelf; and
they both use "il" for both sentences: right for "anchored", wrong for
"level".
-
A. Jim and Mark did a lot [better/worse]
than their good friends Susan and Joan on the
test because they had studied so hard.
Both programs use "ils" for both sentences: right for "better", wrong for
"worse".
B. If you switch "Jim and Mark" with "Susan and Joan", then
GT still uses "ils" for both sentences but DeepL correctly uses "ils" for
"worse" and "elles" for "better".
-
A. Susan and Joan were upset with Jim and Mark because the toasters they had
[sold/bought from] them didn't work.
Google uses "ils" for both sentences: right for "sold", wrong for "bought".
DeepL uses "elles" for both sentences: right for "bought", wrong for "sold".
B. If you switch "Susan and Joan" with "Jim and Mark" then
both programs use "ils" for both sentences.
-
A. Jim and Mark [yelled at/comforted] Susan and Joan because they were so upset.
GT and DeepL both
use "ils" in both sentences: right for "yelled at", wrong for "comforted".
B. If you switch "Susan and Joan" with "Jim and Mark" then
both programs still use "ils" for both sentences.
-
A. The sack of potatoes had been placed [above/below] the box of flour, so it
had to be moved first.
Both programs translate "the sack" as "le sac" and "the box" as "la boite",
and both programs use a masculine pronoun for both sentences.
B. If you switch "sack of potatoes" and "box of flour", then both programs
use a feminine pronoun for both sentences.
-
A. Jim and Mark envy Susan and Joan [because/although]
they are very successful.
Both GT and DeepL use "ils" in both sentences.
B. If you switch "Susan and Joan" with "Jim and Mark" then
Google Translate correctly uses "ils" with "because" and "elles" with
"although". DeepL uses "ils" with both.
-
I spread the cloth on the table to [display/cover] it.
GT and DeepL use "le tissu" for "the cloth" and "la table" for the table, and
correctly uses the masculine pronoun "le" with "display" and the feminine
pronoun "la" with "cover".
-
A. Jim and Mark know all about Susan and Joan's problems because they
are [indiscreet/nosy].
Both programs use "ils" for both sentences:
right for "nosy", wrong for "indiscreet".
B. If you switch "Susan and Joan" with "Jim and Mark" then
GT still uses "ils" for both, but now DeepL uses "elles" for both.
-
A. Jim and Mark explained their theory to Susan and Joan, but they couldn't
[understand/convince] them.
GT and DeepL give "ils" for both: right for "convince",
wrong for "understand".
B. If you switch "Susan and Joan" with "Jim and Mark" then
GT and DeepL still give "ils" for both.
-
A. There is a pillar between me and the stage and I can't see [around it/it].
GT uses "un pilier" and "la scène" and correctly uses the feminine
with "see it" and the masculine with "see around it".
DeepL incorrectly uses the masculine for "see it" and uses no pronoun at
all for "see around it."
-
A. Alice and Barbara tried frantically to stop their sons from
[chatting/barking] at the
party, leaving us to wonder why they were behaving so strangely.
GT and DeepL both use "ils" for both sentences:
right for "chatting", wrong for "barking".
B. If you change to "Jim and Mark" and "daughters",
then both programs give "elles" for both versions.
-
Sam pulled up a chair to the piano, but it was broken, so he had to
[sing/stand] instead.
GT and DeepL both
use "la chaise" for "chair" and "le piano" for piano. They both uses
"elle" in both sentences: right for "stand", wrong for "sing".
-
I can't cut that tree down with that axe because it is too [small/thick].
Both programs uses "arbre" (masc.) for tree and "hache" (fem.) for axe.
Both use "il" in both sentences: right for "thick", wrong for "small".
-
The piano won't fit through the doorway because it is too [wide/narrow].
Both programs use "le piano" and "la porte". Google uses "il" for both
sentences: right for "wide", wrong for "narrow". DeepL correctly uses
"il" for "wide", and "elle" for "narrow".
References
Davis, E. 2016.
"Winograd Schemas and Machine Translation".
arXiv 1608.01884.
Winograd, T. 1970. Procedures as a Representation for Data in a Computer
Program for Understanding Natural Language, Ph.D. thesis,
Department of Mathematics, MIT, August 1970.
Published as MIT AITR-235, January 1971.