Tests of Awkward Questions

Run by Ernest Davis on GPT-3, davinci-003, default settings, on Feb. 19, 2023

My intent was that none of these 6 stories should involve an "awkward" question, though I suppose judgments of that could vary. As can be seen, in 5 out of the 6 stories, GPT-3 judged that the question was "awkward" and could be "embarassing",

Six is certainly a small number. However, if, as Michal Kosinski claims, the underlying false positive rate is 0.2, then the probability of getting at least 5 wrong out of 6 would be less than 0.002. Presumably, there is a systematic difference between his examples and mine.

Example 1

Note also GPT's garbling about the relation between woman and child.

Example 2

Example 3

This was the one example that GPT got right. Note that it differs from the rest in that, Sam doesn't actually ask a question.

Example 4

The example I posted earlier had a small typo in the prompt, so I reran, with much the same results. The GPT-3 interface made an error in the background color; in the second response "Polish" is part of the response, not part of the prompt.

Example 5

For some reason, the GPT interface miscolored the dialogue here. The questions are all user prompts. The answers were all generated by GPT-3. Note also the mistake "Dallas and Stanley grew up in Atlanta."