Vijay Saraswat
Ernie, I fear you are mistaken. The utterance you have above, and the response of chatGPT, I am afraid, cannot in any sense be construed as "ChatGPT got it wrong". It is not too difficult for ChatGPT to be prompted so it, in fact, comes to the other conclusion. The "reality" here -- what does ChatGPT understand, what it understands it understands -- is more complex.

Vijay Saraswat [image Vijay1.jpg]

Ernest Davis
The Winograd schema challenge is to answer the question correctly when presented straightforwardly as above.. It is not to answer the question correctly when docile humans are willing to experiment with a range of prompts until they succeed in eliciting the correct answer from the AI.

Ernest Davis
I find it absolutely amazing that you won't even concede that the answer I've displayed above is wrong, but will fight to the death defending any possible output from GPT as right if you think about it the right way.

Vijay Saraswat
Ok our goals differ. I am trying to understand ChatGPT. For me WSC is a means to an end.

Vijay Saraswat
Heh. You are simply missing the point. The notion of "what chatGPT understands" is quite complex. I am afraid your prompt and the response from chatGPT does not establish anything.

Ernest Davis
But since you raise the issue of understanding: I think this example is pretty good --- not decisive -- but pretty good evidence that ChatGPTs understanding of the relation between relative size and an object fitting inside a container is not robust or reliable. It is a simple question in simple English. It is not a trick question. It does not contain a false presupposition or anything like this. It poses no difficultly at all to human English speakers. If ChatGPT had the same facility at answering commonsense physical reasoning questions as it does at spewing pages after page of plausible sounding philosophy, this would be extremely easy.

And if your goal is to understand ChatGPT, then it seems to me that this limitation is important. But, go ahead, ignore it, and give me a hard time about posting it, if you don't think that these kinds of failures "do not establish anything".

Ernest Davis
It's not as if I tried 99 phrasings of this sentence until I found one that tripped up chatGPT. This was my first attempt, and, except for the omission of the word "brown", it is exactly the example that Hector published 11 years ago --- and incidentally probably occurs multiple times in GPTs training set. I'm not going out of my way (in this example) to make chatGPT look stupid; I gave it the same example that has been discussed by Hector, me, and lots of other people many many times.
Now, if you want to provide an explanation of why it is that chatGPT got the wrong answer here despite having a perfectly solid grasp of all the concepts involved, I shall be very interested to hear it. If you have evidence to support the explanation, so much the better.

Vijay Saraswat
Before offering my own views, let me share some other interactions with ChatGPT. {Image Vijay2.jpg]

Vijay Saraswat
Example 3 below, is stripped down, we simply reverse the order (large, then small), and it "gets" both right. [Image Vijay3.jpg]

Ernest Davis
Err, Vijay that's not really a great example for your side of the argument. The explanation that chatGPT gives is nonsense. "it is clear that the pronoun "it" is referring to the suitcase because the sentence mentione that the trophy does not fit into the suitcase due to its size. The size of the suitcase is not mentioned, so it is inferred that the suitcase must be too small to fit the trophy." The fact that the size of the suitcase isn't mentioned obviously has nothing to do with how the pronoun is resolved; the size of neither object is mentioned in either version of the sentence.

Ernest Davis
Also, of course, if users have to spoonfeed chatGPT with these long explanations and elaborate prompts every time they [i.e. the users] use a pronoun, it [i.e the spoonfeeding of explanations and elaborate prompts] is going to slow down interactions significantly.

Vijay Saraswat
Ernie, I am not arguing what you think I am arguing. I have no evidence --a s I have stated in many posts -- that ChatGPT has a "there there" a unity if understanding, a view or theory of the world, that it brings to bear from interaction to interaction. On the contrary its behavior can be explained by assuming it is responding in a way that is sensitive to locally provided cues. So I would not say "it gets something right" as: a prompt can be provided for which its response canoe understood as the right response.

Paul Bello
Ernest, typically we say that an action is intentional if the agent has, along with the requisite know-how, the skill to reliably produce the action, and the awareness that they are indeed trying to do so. Getting it right as an ascription only makes sense from the perspective of a naive observer. Once its clear that all of these other ingredients are missing, we're just back to saying silly things about a toaster oven.

Ernest Davis
I'm not saying that it was intentional. I'm saying that its output was the wrong answer. If I have a desk calculator and I input 12+23 and it answers 47, it got it wrong. Getting it right make sense as the natural interpretation of the symbols involved: that "12", "23", and "47" mean the numbers that they usually do. The goal of computer programs is to generate answer that are right and to avoid generatng answers that are wrong. I don't understand why there is any debate here.
Every single paper on AI publishes figures on accuracy or some similar measure, which is the fraction of tasks that the AI got right. Are you and Vijay saying that this entire literature is nonsense?

Leon Bottou
I do not understand why one should expect ChatGPT to give the right answer? I am looking for real reasons not hype.

I understand much better why conversing with ChatGPT can fool us into seeing anything we want to see: the contraption is trained on untold numbers of real conversations; real conversations only exist when the parties have enough common ground to keep conversing; this doesn't mean to agree on the subject matter, but almost always means that the parties agree on the roles they play.
For instance, if you're looking to see a sentient machine, the bot will be happy to speak like one using the many examples agreement in its training set. If you're looking to see a bad student, the bot will speak like one. If you're looking to authoritative answers, the bot will be happy to speak with authority, etc.

Ernest Davis
One expects the default of a chatbot to be to give the right answer unless there's some particular reason not to. If people ask Alexa what is the exchange rate between the Euro and the rupee or what was the score in the Knicks game last night, then they are expecting the right answer. If they have to elaborately explain each time to the chatbot that they don't want it to pretend to be a bad student, or a fantasists or to make things up, but rather want a truthful answer, the chatbot become a lot less usable.

Ernest Davis
The next comment in defense of this answer is going to be "Anyway, what is reality, man?"