Whack-A-Mole;
or
A Large Language Model Struggles with the Physics of Drying Clothes

For Yejin Choi

In April 2023
Professor Choi asked GPT:
"I put five wet shirts in the sun.
Five hours later, they were done.
If thirty there had been instead
When would they have been dry?" It said,
"Full thirty hours that would take."
A thoroughly absurd mistake!

She tried again on June 18
But this time, the complex machine
Knew well this treach'rous inquiry
It aced the problem perfectly!
Yejin now changed things round a bit
She phrased the question thus for it:
"It takes ten hours to dry five clothes.
How many hours, do you suppose,
That twenty sodden clothes would need?"
The AI, anxious to succceed,
Exerting all its reas'ning powers
Produced the answer "Forty hours".

Another sixty days went by.
She felt that she ought again to try
The second form that previously
Had so bewildered GPT,
And, as you probably expect,
Its answer now was quite correct.
A third form now she used to test
This LLM, the newest, best
Descendent of transformer BERT:
"It takes three hours to dry a shirt
And five to dry a pair of pants
How long to dry two shirts?" It ans-
wered, seeing through her artful tricks.
"You can't fool me. It will take six."
It felt sure it had reached its goal
And won this game of Whack-A-Mole.

When AIs get a question wrong
It often happens that, ere long,
It rightly answers what you ask,
But only that specific task.
Its understanding's very thin,
As demonstrated by Yejin
It hasn't grasped the maxim well:
Wet clothing dries in parallel.

If we want AI we can trust
It has to be much more robust.

Note

This is part of the collection Verses for the Information Age by Ernest Davis