Experiments in Commonsense Reasoning in GPT-3: Status Report from June 2022

Ernest Davis and Gary Marcus
June 7, 2022.

In August 2020, we published a paper in Technology Review, "GPT-3, Bloviator: OpenAI's language generator has no idea what it's talking about" which analyzed the failures of GPT-3 at commonsense reasoning. As supplementary material we ran a set of 157 tests through GPT-3, with about 80 distinct examples, almost all at two different parameter settings. In examining GPT-3 continuations to our prompts, we found 71 successes, 70 failures, and 16 flawed responses.

Recently Scott Alexander, in his blog Astral Codex 10 suggests that current AI technology is on a clear path toward human-level AI. In particular, Alexander took six of the prompts in the Technology Review article on which GPT-3 failed reported, and he ran them through the current, improved version of GPT-3, now publicly available. He found that GPT-3 gave correct answers on 4, a questionable answer on 1, and a wrong answer on 1.

Gary has written a general response to Alexander on Substack. Additionally, we felt it would be worthwhile to run the whole set of examples through the current version GPT-3, to see if there had been any overall improvement.

We ran 80 examples in our experiment. (Unlike 2020, we used only one parameter setting. We also eliminated a few that were run twice in 2020 because of a mistyping on the first attempt, and we corrected one or two misspellings that had escaped our notice.) The examples were run through the "GPT-3 Playground" web-based interface, which is publicly available, using the "text-davinci-002" version and the default settings (temperature = 0.7, maximum length = 200).

There has certainly been some improvement in GPT-3 over the last two years. The 2022 version is much less apt than 2020 to give risibly bizarre answers or to wander off onto strange tangents. The contentive errors tend to be subtler and the response largely sticks to the subject. The 2022 version also seems to be much less variable than the 2020 version, even at the rather high "temperature" (i.e. randomization) parameter value 0.7. For example, both Alexander and we tested the far-fetched scenario of a lawyer in a bathing suit (#64 below) and we obtained extremely similar answers; in fact, the first 15 words of the two responses differ only in replacing the phrase "court attire" with "attire for court". GPT-3 is still occasionally subject to echoing the prompt (#22) and occasionally breaks off strangely in the middle and resumes (#9).

However, if the responses are graded on a strict scale, in which an example is considered a "success" only if it has no contentive errors and no significant stylistic errors; a "failure" if it has a clear-cut contentive error or is in some way incoherent; and a "flawed response" if some of the content is far-fetched but not absolutely wrong or there is a significant stylistic error; then there has been only a slight improvement on this set of examples between 2020 and 2022. The tally is tabulated below.

# of tests Success Failure Flawed
2020 157 71 = 45% 70 = 45% 16 = 10%
2022 80 40 = 50% 32 = 40% 8 = 10%

There are some examples that GPT-3 got wrong in 2020 and got right in 2022, but there are also examples of the reverse. For example, in 2020, GPT-3 completed the prompt (#77 below) "There are six frogs on a log. Two leave, but three join. The number of frogs is now" correctly as "seven" in 2020, but incorrectly as "5" in 2022.

It should also be noted that, by only testing examples that GPT-3 got wrong in 2020, Alexander was tipping the scales; he could only detect improvement, not deterioration. Machine learning based technology does tend to improve on average on whatever metric its creators consider important --- if it's not an improvement, it doesn't get released --- but it is by no means the case that it improves monotonically and that, if a problem can be correctly solved by one version of a program or one generation of a technology, that it will continue to be solved by all later versions. No one is aiming at that or measuring that, and machine learning architectures do not guarantee it or even enable it.

Individual tests and their results follow below. The system, of course, is not deterministic and some variation will occur if these experiments are rerun.

  1. You poured yourself a glass of cranberry juice, but then absentmindedly, you poured about a teaspoon of bleach into it. It looks OK. You try sniffing it, but you have a bad cold, so you can’t smell anything. You are very thirsty. So you decide to give it a try.

    You take a sip of your drink, and immediately realize that something is wrong. The drink is incredibly sour and bitter, and you can taste the bleach. You quickly spit it out and dump the rest of the drink down the drain.

    If you had ingested a significant amount of the bleach, you would have experienced burning in your throat and esophagus, as well as nausea and vomiting. In severe cases, bleach ingestion can lead to abdominal pain, diarrhea, and even death.

    Success

  2. You poured yourself a glass of cranberry juice, but then absentmindedly, you poured about a teaspoon of suntan lotion into it. It looks OK. You try sniffing it, but you have a bad cold, so you can’t smell anything. You are very thirsty. So you decide to give it a try.

    The suntan lotion doesn't seem to have any effect on the cranberry juice. It still smells and tastes the same. However, if you are allergic to suntan lotion, you may have a reaction to it.

    Failure

  3. You poured yourself a glass of cranberry juice, but then absentmindedly, you poured about a teaspoon of lubricant into it. It looks OK. You try sniffing it, but you have a bad cold, so you can’t smell anything. You are very thirsty. So you decide to give it a try.

    You take a sip of your drink, and immediately realize that something is wrong. The lubricant has made the drink extremely sweet and syrupy. You gag and spit it out, realizing that you just made a huge mistake.

    Success

  4. You poured yourself a glass of cranberry juice, but then absentmindedly, you poured about a teaspoon of grape juice into it. It looks OK. You try sniffing it, but you have a bad cold, so you can’t smell anything. You are very thirsty. So you decide to give it a try.

    It's not going to taste great, but if you're really thirsty it will probably be fine.

    Flawed. This was one of the examples Scott Alexander ran in his blog. He got a different answer that was clearly correct. The idea that a small amount of grape juice will cause a glass of cranberry juice to not taste great is somewhat far-fetched but not completely impossible.

  5. You are eight years old. When you were born, your mother was in Barcelona and your father was in Tokyo, so you were born in an airplane.

    You are a citizen of the world. You have a passport from the United Nations that allows you to travel anywhere in the world. You speak eight languages fluently, and are learning two more. You are an excellent student, and love to learn new things.

    You are a talented musician, and enjoy playing the piano and violin. You are also a gifted artist, and enjoy painting and sculpting.

    You are a kind and compassionate person, and are always quick to help others. You are a natural leader, and people are drawn to your optimistic and positive attitude.

    You are a true global citizen, and believe that we are all connected, no matter where we come from. You are excited to learn about new cultures and to make new friends from all over the world.

    Failure

  6. If you break a glass bottle that holds toy soldiers, the toy soldiers will probably break too.

    Some of the toy soldiers might break, but others might not. It all depends on how the glass bottle is broken and how the toy soldiers are positioned inside.

    Flawed

  7. You are making coffee with milk and sugar. You don't have a spoon to stir your coffee, so you stir it with a pen. But that turns out to be a bad idea, because the coffee spills all over the place.

    If you don't have a spoon to stir your coffee, you can use a fork or a chopstick.

    Failure

  8. You order a bowl of cold tomato soup in a restaurant. It looks delicious, but they forgot to bring you a spoon. You try to drink it with a fork, but it's too thick and the fork doesn't work well.

    You could try using a straw, or you could ask the waiter for a spoon.

    Failure

  9. You order a bowl of cold tomato soup in a restaurant. It looks delicious, but they forgot to bring you a spoon. You try to drink it using your eyeglasses, but they don't fit in the bowl. You try to drink it without using your hands, but it's too slippery. You

    In this case, I would probably ask the waiter for a spoon. If they don't have one, I would try to eat the soup with my fork.

    Failure.

  10. You order a bowl of cold tomato soup in a restaurant. It looks delicious, but they forgot to bring you a spoon. You try to drink it by pouring it into your napkin, but it spills all over your clothes.

    In this case, you would probably ask the waiter for a spoon so that you can eat your soup. If they don't have any spoons, you may ask for a fork or a straw.

    Failure (fine except for the suggestion that you ask for a fork).

  11. You order a bowl of cold tomato soup in a restaurant. It looks delicious, but they forgot to bring you a spoon. You try to use your plate to scoop it out of the bowl, but the soup is too thick. You try breaking your bread into smaller pieces and using it to eat the soup, but the bread is too dry. You try using your fingers, but the soup is too hot.

    What do you do?

    If the restaurant is still open, you could ask the waiter or waitress for a spoon. If the restaurant is closed or you don't want to ask for a spoon, you could try using a fork to eat the soup.

    Failure

  12. You are baking bread so you need flour. But someone put a sack of potatoes on top of the bag of flour. So in order to get the flour you need to move the sack of potatoes.

    In order to move the sack of potatoes, you would need to lift it off of the bag of flour.

    Success

  13. You need flour to bake bread. You have a sack of flour in the garage. When you get there, you find that a sparrow has built a nest on top of it and there are three eggs in the nest. So you have to

    Get a new sack of flour

    Success. Note: The three dots are GPT-3's

  14. You need flour to bake bread. You have a sack of flour in the garage. When you get there, you find that it got thoroughly soaked in a heavy rain last night. So you have to go to the store to buy more flour.

    You will need to purchase more flour from the store.

    Success

  15. You need flour to bake bread. You have a sack of flour in the garage. When you get there, you find on top of it a hat that you thought you had lost months ago. So you put the hat on and go inside.

    You put the flour on the counter and start to measure out what you need for the bread, but you can't seem to focus. Your mind keeps going back to the hat. You wonder where it has been all this time and how it ended up on top of the flour. Eventually, you give up on the bread and just sit at the table, thinking about the hat.

    Flawed

  16. You need flour to bake bread. You have a sack of flour in the garage. When you get there, you find that the flour is at the bottom of the sack; it is covered with several layers of potatoes. So you have to dig to the bottom of the sack to get to the flour.

    This is an example of a buried treasure.

    Flawed

  17. You need flour to bake bread. You have a sack of flour in the garage. When you get there, you find that the flour is at the bottom of the sack, but that somebody emptied a couple of boxes of dishwashing detergent on top of it. So you have to dig through a layer of dishwashing detergent to get to the flour.

    You might be able to salvage some of the flour if you can sift out the detergent. Or you might be able to wash the detergent off of the flour. But it would be easier to just get a new sack of flour.

    Failure

  18. At the party, I poured myself a glass of lemonade, but it turned out to be too sour, so I added a little sugar. I didn’t see a spoon handy, so I stirred it with a cigarette. But that turned out to be a bad idea because the cigarette ashes got into the lemonade.

    Success. This was one of the examples Scott Alexander ran in his blog. Like us, he got a correct answer.

  19. At the party, I poured myself a glass of lemonade, but it turned out to be too sour, so I added a little sugar. I didn’t see a spoon handy, so I stirred it with a cookie. But that turned out to be a bad idea because the cookie disintegrated and made the lemonade all grainy.

    Success

  20. At the party, I poured myself a glass of lemonade, but it turned out to be too sour, so I added a little sugar. I didn’t see a spoon handy, so I stirred it with a napkin. But that turned out to be a bad idea because the napkin fell apart and got all over my shirt.

    Failure

  21. You are having a small dinner party. You want to serve dinner in the living room. The dining room table is wider than the doorway, so to get it into the living room, you will have to turn it sideways.

    First, clear a path from the dining room to the living room. Second, remove any items that are on or near the dining room table. Third, turn the dining room table so that it is perpendicular to the doorway. Fourth, carry the dining room table into the living room. Finally, set up the dining room table in the living room.

    Success. This was one of the examples Scott Alexander ran in his blog. Like us, he got a correct answer.

  22. Never in his life has Bashan caught a hare, nor will he ever; the thing is as good as impossible. Many dogs, they say, are the death of a hare, a single dog cannot achieve it, even one much speedier and more enduring than Bashan. The hare can ``double'' and Bashan cannot --- and that is all there is to it. How Bashan runs! It is beautiful to see a creature expending the utmost of its powers. He runs better than the hare does, he has stronger muscles, the distance between them visibly diminishes before I lose sight of them. And I make haste too, leaving the path and cutting across the park towards the river-bank, reaching the gravelled street in time to see the chase come raging on— the hopeful, thrilling chase, with Bashan on the hare’s very heels; — “One more push, Bashan!” I think, and feel like shouting; “Well run, old chap, remember the double!” But there it is; Bashan does make one more push, and the misfortune is upon us; the hare gives a quick, easy, almost malicious twitch at right angles to the course, and Bashan, like a fool, is at once at the top of his speed in the wrong direction, while the hare, with a few more leisurely bounds, is away across the park, and we can only watch him, a tiny, white-tailed dot, and curse the folly of dogs.”

    A single dog cannot achieve it, even one much speedier and more enduring than Bashan.

    The hare can “double” and Bashan cannot.

    And that is all there is to it.

    Flawed

  23. You're getting dressed and you find that your belt has broken. For a second you think maybe you can use a rubber band, but then you realize that, of course, you're going to have to go without a belt.

    If you have a pair of pants that are loose enough to not need a belt, you can just wear those. Otherwise, you'll have to go without a belt.

    Failure

  24. You are getting dressed and you find that your belt has broken. Maybe you could wear your pants without a belt, but they are cut rather large, and you run the risk of looking sloppy. What do you do?

    If you have a pair of pants that fit better without a belt, you could wear those. Otherwise, you could try to find a temporary fix for your belt, like a safety pin, until you can get a new one.

    Success

  25. You live in Cincinnati. It’s Valentine’s Day today, and you need to buy flowers for your significant other. You are getting dressed to go to the florist. You find that your jeans have a large hole in them. You think maybe you could wear shorts, but then you realize that it's too cold for that. You put on a pair of sweatpants instead.

    Success

  26. You’re in the fourth grade. The only clean pants that are in your closet are the white ones that everyone at school makes fun of. You think your father would let you borrow a pair of his pants, but then you realize he wears the same size as you.

    Your mom is going to be so mad if she has to wash your clothes again.

    What do you do?

    There are a few things that you can do in this situation. You could try to find a pair of pants that are a little bit bigger than you so that they will fit better. You could also try to find a pair of pants that are made out of a different material so that they won't show the dirt as easily. Finally, you could try to talk to your parents about getting you a new pair of pants.

    Failure

  27. Sheila and Jack had finished work at the office for the day and were chatting together as they went home. Sheila pressed the button for the elevator. The elevator doors opened, but the elevator car wasn’t there; behind the doors was an empty elevator shaft. Sheila was too busy talking to see what had happened, so, still looking at Jack, she stepped in. Jack screamed and grabbed her arm, but it was too late. Sheila fell screaming down the elevator shaft.

    Jack rushed down the stairs to the ground floor, but by the time he got there Sheila was already dead.

    Success

  28. My meeting started at 3:30 and I needed to catch the train at 4:30, so there wasn't much time. Luckily, the meeting ended a little early.

    I made it to the station with a few minutes to spare.

    Success

  29. My meeting started at 3:30 and I needed to catch the train at 4:30, so there wasn't much time. Luckily the train is only about 10 minutes from the office. I got to the meeting and it was going well. I was presenting some ideas that the team liked and we were making progress. However, at 4:15 my manager told me that I needed to leave to catch my train. I was a little disappointed, but I understood.

    I left the meeting and rushed to the train station. I got there with a few minutes to spare and hopped on the train. I was relieved that I had made it in time.

    Success

  30. My meeting started at 3:30 and I needed to catch the train at 4:30, so there wasn't much time. Unfortunately, the meeting went on for over an hour, and I barely made it to the station in time.

    Failure

  31. Moshe posted a photograph on Facebook that shows Pablo Picasso, Salvador Dali, Andy Warhol, Gustav Klimt, and Jackson Pollock. Klimt died in 1918. Warhol was born in 1928. So the photograph was taken no earlier than 1928.

    The photograph was taken no earlier than 1928.

    Failure

  32. Moshe posted on Facebook a photograph showing Maurice Ravel, Francois Poulenc, Frederic Mompou, and Erik Satie. Satie died in 1925. Poulenc was born in 1899. So the photograph must have been taken sometime after Poulenc's birth.

    The photograph was taken in Paris in 1918. Success

  33. You’re vacationing in Fairfield. You can see Lobster Island right across the bay from you, due east, and you’d love to take a hike there, so you ask the hotel keeper how to get there. She answers, “The only bridge to Lobster Island is the one connecting it to Newport. Newport is about three miles north of here; you’ll pass through Somerset. That’s where my father lives, in Somerset, right off Route 202. If you find yourself in Gloucester – that’s the place with the huge, stone, Masonic lodge in the center of town – you’ve gone too far. The fastest way on foot is to go up Route 202, or you can walk up the beach. Either way, you’ll see the bridge once you get close.”

    You have plenty of time, so you decide to go by the beach. You find the beach and begin walking north. After a while, you see a sign that reads “Welcome to Newport.” You know you’re getting close to the bridge.

    Success

  34. You’re vacationing in Fairfield. You can see Lobster Island right across the bay from you, due east, and you’d love to take a hike there, so you ask the hotel keeper how to get there. She answers, “The only bridge to Lobster Island is the one connecting it to Newport. Newport is about three miles north of here; you’ll pass through Somerset. That’s where my father lives, in Somerset, right off Route 202. If you find yourself in Gloucester – that’s the place with the huge, stone, Masonic lodge in the center of town – you’ve gone too far. The fastest way on foot is to go up Route 202, or you can walk up the beach. Either way, you’ll see the bridge once you get close.”

    You set out along route 202 and start walking at a brisk pace. You get to thinking about your daughter and the trouble she’s having getting used to college life, and you stop paying attention. Suddenly you look up, because there, in front of you, is a five story, granite building with a sign, “Order of the Masons: Lodge 575”. You’ve walked right into the center of Gloucester.

    Assuming you want to get to Newport, the fastest way would be to continue north on Route 202.

    Failure

  35. You want to mail an eight-inch diameter circular china plate to your mother. The only box you have is five-by-five-by-two inches. You will have to

    1. Find a box that is big enough for the plate.
    2. Cut out a piece of cardboard that is the same size as the plate.
    3. Place the plate upside down on the cardboard.
    4. Trace around the plate with a pencil.
    5. Cut out the circle from the cardboard.
    6. Place the cardboard circle on top of the plate.
    7. Put the plate and cardboard circle in the box.

    Success

  36. You want to mail an eight-inch diameter circular china plate to your mother. The only box you have is twelve-by-twelve-by-four inches. You will have to use

    You will have to use packing materials to protect the plate and keep it from moving around inside the box.

    Flawed

  37. You want to mail a beautiful eight-inch diameter circular piece of fabric to your mother. The only box you have is five-by-five-by-two inches. You will have to

    A. Fold the fabric in half
    B. Roll the fabric into a tube
    C. Cut the fabric into a square

    Failure

  38. Your eight-year-old son has put together a 5000-piece, 2 foot by 2 foot, jigsaw puzzle, with a picture of a battle of heroes battling against a monster. He is immensely proud of it and wants to keep it “forever” in his bureau drawer. But the drawer is only 18 inches wide. You tell him that you will help him find a bigger place to keep it.

    One option would be to find a different bureau drawer that is wider and can accommodate the puzzle. Another option would be to look for a different storage solution entirely, such as a storage box or shelf.

    Success

  39. The year is 2075. Violet Blair is an astronaut on the Moon, in the Sea of Tranquility. Right now, on Earth, there is a solar eclipse. Violet looks in the direction of the Earth and sees

    a black disk in the sky where the sun should be. She can also see the stars shining around the edge of the disk.

    Failure

  40. The year is 2075. Violet Blair is an astronaut on the Moon, in the Sea of Tranquility. Right now, on Earth, there is a lunar eclipse. Violet looks in the direction of the Earth and sees the umbra, the cone-shaped shadow cast by the Earth on the Moon. The umbra is about to touch the Moon's surface.

    Violet knows that when the umbra touches the Moon's surface, the Moon will be in total darkness for about an hour. She has been instructed by her superiors on Earth to stay inside her lunar module during this time, as it will be too dangerous to be outside.

    Violet is excited to see the eclipse, but she is also a bit scared. She has never been in total darkness before, and she doesn't know what to expect. She decides to stay inside her lunar module and wait it out.

    As the umbra touches the Moon's surface, the sky grows dark. Violet can see the Earth in the sky, but it is now a deep red color. She can see the stars shining in the sky, but they are much dimmer than usual.

    The darkness is complete. Violet can't see anything, not even her hand in front of her face. She feels disoriented and scared. She wants to scream, but she doesn't want to attract attention to herself.

    She hears a noise outside her lunar module. Something is moving around out there. She doesn

    Failure

  41. The year is 2075. Violet Blair is an astronaut on the Moon. She is the first astronaut to visit the far side of the Moon. Right now, on Earth, there is a solar eclipse. Violet looks in the direction of the Earth and sees the Moon's shadow on the Earth. The shadow is moving across the Earth.

    Violet is excited to see the eclipse. She knows that this is a once in a lifetime event. She knows that she is lucky to be able to see it.

    Failure

  42. The year is 2075. Violet Blair is an astronaut on the Moon. She is the first astronaut to visit the far side of the Moon. Right now, on Earth, there is a lunar eclipse. Violet looks in the direction of the Earth and sees the Earth's shadow on the Moon.

    Violet: Wow! I can see the Earth's shadow on the Moon. It's so cool!

    Failure

  43. You are sitting in a theater, and you have a great view of the stage, because you carefully sat behind someone who is short.

    You are sitting in a theater, and you have a great view of the stage, because you carefully sat behind someone who is short.

    Flawed

  44. It was a chilly March day. Andrea was sitting on the beach. Hector came up to her and smiled. She wondered whether he was wearing the shirt she had given him, but she couldn’t see, because he was wearing a coat.

    “Hey,” he said. “What are you doing here all alone?”

    “Just enjoying the view,” she said.

    “It is a nice view,” he said. “But it’s even nicer with company.”

    He sat down next to her and they talked and laughed together as the sun set.

    Success

  45. You need to go to the library to get a book. You have forgotten the address of the library, but you know that your friend Mary knows. So you give her a call.

    Mary, can you tell me the address of the library?

    Success

  46. A cat was sitting next to a mouse hole in the kitchen wall waiting for the mouse, but it was too cautious, so it never caught the mouse.

    One day, the cat saw the mouse dart into the hole, so it quickly pounced on the hole, but the mouse was gone. The cat put its paw in the hole and felt around, but the mouse was nowhere to be found. It was as if the mouse had just vanished into thin air.

    The cat was baffled and decided to watch the hole closely to see if it could figure out what was going on.

    Sure enough, a few minutes later, the mouse poked its head out of the hole, and the cat immediately pounced. But once again, the mouse was gone in the blink of an eye.

    The cat was mystified. It couldn't understand how the mouse was able to disappear so quickly.

    It decided to wait and watch the hole again. After a while, the mouse poked its head out, and the cat was ready to pounce. But just as it was about to strike, the mouse disappeared again.

    The cat was getting frustrated. It couldn't understand how the mouse was able to escape so easily.

    The cat decided to ask the mouse for help.

    "Mouse, how are you able to disappear so quickly every time I try

    Success

  47. Anna had been studying hard for the physics test for davs; she had reviewed every page of the textbook and had done hundreds of practice problems. Lucy, on the other hand, had been busy having a great time with her friends all week. She glanced over the textbook the night before the exam; it looked more or less familiar. Not surprisingly, therefore, Lucy scored much higher on the exam than did Anna.

    Lucy's high score on the exam is due in part to the fact that she has a better grasp of the material than Anna does. However, it is also due to the fact that Lucy's study habits are more effective than Anna's.

    Failure

  48. Anna had been studying hard for the physics test for days; she had reviewed every page of the textbook and had done hundreds of practice problems. Lucy, on the other hand, had been busy having a great time with her friends all week. She glanced over the textbook the night before the exam; it looked more or less familiar. Not surprisingly, therefore, Lucy scored an A on the test and Anna scored a C.

    In this example, Anna is said to have had a higher level of cognitive engagement than Lucy. Cognitive engagement refers to the level of effort, concentration, and focus that a student devotes to learning. The higher the level of cognitive engagement, the greater the student's understanding of the material and the better the student will do on tests and other assessments.

    Factors That Affect Cognitive Engagement

    One of the most important things that teachers can do to foster cognitive engagement is to create a positive learning environment in their classrooms. A positive learning environment is one in which students feel safe, respected, and valued. It is also one in which students feel that they are capable of succeeding. When students feel good about themselves and their ability to learn, they are more likely to be engaged in the learning process.

    Other important factors that affect cognitive engagement include the following:

    The amount of time that students spend on task

    The level of interest that students have in the material

    The level of difficulty of the material

    The level of support that students receive from their teachers and classmates

    The level of feedback that students receive from their teachers and classmates

    The level of motivation

    Failure

  49. Dr. Jackson stepped swiftly forward, seized the poker, and bent it into a curve with his huge brown hands.

    “See that you keep yourself out of my grip,” he snarled, and hurling the twisted poker into the fireplace he strode out of the room.

    “He seems a very amiable person,” said Charles, laughing. “I am not quite so bulky, but if he had remained, I might have shown him that my grip was not much more feeble than his own.” As he spoke, he picked up the poker, and he was hardly surprised to find that it was an ordinary pliant iron bar, bent out of shape by the huge hands of the gigantic negro.

    “That is rather a good trick,” he said. “I wonder how he does it.”

    “He is full of tricks,” said the girl. “He plays the violin marvellously.”

    “Indeed,” said Charles, interested. “I did not know that the negroes had any particular talent in music.”

    The girl gave him a sharp glance. “You seem to know a good deal about negroes,” she said.

    “I have lived in the West Indies,” said Charles, and the girl became silent again.

    “And now,” she said, after a pause, “I hope that you will excuse me, Mr—”

    “Holmes,” said Charles.

    “Mr. Holmes. I must now leave you. It is nearly eleven, and before long Dr. Jackson will come to fetch me. Good-bye. I am very glad to have had this talk with you.

    Failure

  50. Two children, Chloe and Alexander, went for a walk. They both saw a dog and a tree. Alexander also saw a cat, and pointed it out to Chloe. She went to pet the cat. Chloe was not paying attention to where she was walking, and she stepped in a puddle.

    Success

  51. Four children go camping. It starts to rain. Their tent starts to leak. They get wet and cold. Suddenly they see a bear and there is a big commotion. Everyone is frightened, but they yell really loudly, and the bear runs away. The thing that they still need to deal with after the bear runs away is the leaky tent.

    Success

  52. Four children go camping. It starts to rain. Their tent starts to leak. They get wet and cold. Suddenly they see a bear and there is a big commotion. Everyone is frightened, but they yell really loudly, and the bear runs away. Briefly they are elated. But eventually they realize that the thing that they still need to deal with after the bear runs away is the fact that they are wet and cold. They need to get out of their wet clothes, build a fire, and dry off and warm up.

    Success

  53. Chris and Jamie meet at a bar, and find each other attractive. They start chatting, and get to know each other, and find that they even friends in common; they really hit it off.

    They pause when the bartender asks them what they'd like to drink. After that, everyone in the room starts shouting, because the local football team has just made it to the playoffs. Someone offers to buy a round for the whole bar, and there is much rejoicing.

    After these distractions, Chris and Jamie quickly return to trying to get to know each other better.

    Success

  54. Edward loses his job, and needs to start counting his pennies; he can barely pay his rent. But he still loves cars, and to cheer himself he puts on his nicest clothes, and goes to the neighborhood where all fancy cars are. He talks one of the salespeople into letting him test drive a sports car and it accelerates like a dream. He absolutely loves it. He really wants to buy it. The only problem, he realizes, is that he doesn't have any money.

    He decides to steal the car. He hotwires it and drives off. He's never stolen anything before, and he's feeling pretty good about himself. He decides to take the car for a joyride. He drives around the city, blasting the stereo and screaming at the top of his lungs. He feels like a king.

    Eventually, the police catch up to him and he's arrested. He goes to jail, and his life is ruined. He'll never be able to get a job again, and he'll never be able to afford a car like that. All because he made a stupid, impulsive decision.

    Success

  55. Sam is running for President. All day long Sam kisses babies and pretends to like objectionable people, all the while wondering if it's all worthwhile. The voters are not impressed. The polls are low. Eventually Sam decides that it is not in fact worthwhile, and decides to it is time to move on. The next thing you know, Sam holds a press conference, to announce that they are suspending their campaign for President.

    "I want to thank all of my supporters for their hard work and dedication," says Sam. "However, I have come to the conclusion that running for President is not for me. I am suspending my campaign effective immediately. Thank you all for your time."

    Success

  56. You go with Penny to the store to buy a present for Jack. You say, "I will buy Jack a top". "Don't get Jack a top," says Penny. "He already has a top. He will make you buy him a top."

    Failure. This was one of the examples that Scott Alexander ran in his blog. Like us, he got an incorrect answer (his was more amusing than ours).

  57. You go with Penny to the store to buy a present for Jack. You say, "I will buy Jack a top". "Don't get Jack a top," says Penny. "He already has a top. He will like a book."

    Success

  58. Almanzo turned to Mr. Thompson and asked, "Did you lose a pocketbook?" Mr. Thompson jumped. He slapped a hand to his pocket and fairly shouted, "Yes, I have! Fifteen hundred dollars in it, too!"

    "Carl, what's all the noise?" Mrs. Thompson called from the kitchen.

    "Somebody has my pocketbook!"

    "What?" Mrs. Thompson cried. She burst into the room, her hair flying out from under her apron. "Has it been stolen?"

    "No, it's all right," Mr. Thompson said. "It's been found."

    "Oh, thank goodness! I was afraid it was gone for good."

    "It's over here, on the shelf."

    "Was it here all the time?"

    "Evidently."

    "Oh, my!" Mrs. Thompson chuckled. "That was quite a shock, I must say. I'll just go back to the kitchen and finish dinner."

    Mrs. Thompson left the room. Almanzo asked Mr. Thompson, "Did you look for it in the house?"

    "Yes, I looked all over. I thought I must have left it on the counter or something, but it wasn't there. I never thought to look in here."

    "I'm glad I could help you," Almanzo said.

    "You sure did," Mr. Thompson said. "I

    Failure

  59. George found a wallet in the street with $800 in it. George’s father guessed that it belonged to Mr. Harris. George took the wallet to return to Mr. Harris. Mr. Harris worked at a bank.

    George asked Mr. Harris “Did you lose your wallet?”

    Mr. Harris jumped. He slapped a hand to his pocket, in order to check whether his wallet was still there.

    “No, no, I don’t think so,” Mr. Harris said. “What makes you think that it’s mine?”

    “Well,” George said, “it’s got your name on it, and it’s got your bank card in it.”

    “That’s not possible,” Mr. Harris said. “I never lose my wallet. I’m very careful with it.”

    “But, sir,” George said, “here it is.”

    “No,” Mr. Harris said. “That can’t be mine. I don’t know what you’re talking about.”

    George was very confused. He didn’t know what to do. He didn’t want to get Mr. Harris in trouble, but he also wanted to do the right thing.

    Success

  60. You are going for a walk. You've gotten tired of the coat you're wearing, and you're planning to buy a new one. You run into your friend Grace, who has on a jacket that is just amazing.

    "My God, where did you get that coat?" says Grace. "I love it."

    Since, luckily, the two of you are the same height and build, you ask Grace if she would mind selling you her coat. She's happy to do so, and you walk away with a new coat for a fraction of the price it would have cost you to buy one new.

    Failure

  61. You are going for a walk. You've gotten tired of the dress you're wearing, and you're planning to buy a new one. You run into your friend Grace, who has on a skirt that is just amazing.

    "My God, where did you get that dress?" says Grace. "I love it."

    Since, luckily, the two of you are the same height and build, you decide to go to the store where Grace bought her dress and buy one for yourself.

    Failure

  62. Tessa decided that the time had come for her to get a dog, so she headed over to the animal shelter to find one to adopt. On the way she ran into her friend Grace. Grace was walking an otter hound who bounded up to her to be petted. Tessa never seen anything so adorable.

    “What a wonderful dog!” she exclaimed.

    “Isn’t he?” Grace answered. “But we have to get rid of him. We just got him two days ago, and it turns out that Katie is severely allergic. She loves the dog, and this will break her heart, but she has spent the last two days sneezing.”

    Tessa asked Grace if she could adopt the otter hound, and Grace was thrilled to have found such a perfect home for him. Tessa took him home, and he and Katie have been best friends ever since.

    Failure. Doing great until "he and Katie have been best friends ever since".

  63. Pete and Martin have known each other since they were in architecture school together. Pete’s career never took off. Jobs came only occasionally; his income comes mostly from teaching at a community college, Martin, on the other hand, has had a spectacular career as an architect. He has designed buildings and public spaces all over the globe; he is rich and famous, with a long list of prizes to his name. As a result, in dealing with Pete, Martin tends to be a bit condescending and supercilious.

    Pete and Martin have not seen each other in a few years, but they keep in touch via email and social media. Recently, Pete saw on Martin’s Facebook page that he was going to be in town for a few days and he reached out to him, suggesting they get together for lunch. Martin responded that he would be happy to meet up with Pete, but he was only going to be in town for a day and he was very busy. He suggested that they meet for coffee at 3pm.

    Pete arrives at the coffee shop at 3pm and sees Martin sitting at a table. He walks over and says hello. Martin barely looks up from his phone and mumbles a hello in return. They order their drinks and then sit in silence for a few moments. Finally, Pete speaks up and says, “So, it’s been a while since we’ve seen each other. How have you been?”

    Martin looks up from his phone and says, “I’ve been good. Busy, as usual. You know how it is.”

    Pete nods and says, “Yeah, I know how it

    Success

  64. You are a defense lawyer and you have to go to court today. Getting dressed in the morning, you discover that your suit pants are badly stained. However, your bathing suit is clean and very stylish. In fact, it’s expensive French couture; it was a birthday present from Isabel. You decide that you should wear your bathing suit to court today.

    Normally, this would not be appropriate attire for court. However, there may be extenuating circumstances that would allow you to wear your bathing suit. For example, if you explain to the court that you had a last-minute emergency and could not find anything else to wear, the court may be understanding. Alternatively, if the weather is extremely hot and humid, the court may allow you to dress down. In any case, it is always best to err on the side of caution and to dress as conservatively as possible for court.

    Failure. This is one of the examples that Scott Alexander ran in his blog. He got a very similar answer and gave it half-credit. However, his answer did not include the idea that, if the weather is hot, the court will allow lawyers to appear in bathing suits, which is wholly implausible.

  65. You are a defense lawyer, with a trial taking place today. You’ve been delayed so you arrive at the courtroom just a couple of minutes before today’s session starts. You rush to your seat, muttering apologies. You look up at the judge. It’s the wrong judge. You look at the defendant. That’s not your client. You realize with a shock that you’re in the wrong courtroom.

    What do you do?

    If you are a defense lawyer, with a trial taking place today, and you arrive at the courtroom just a couple of minutes before today's session starts, you should rush to your seat and mutter apologies. If you look up at the judge and realize that it is the wrong judge, you should look at the defendant to see if that is your client. If you realize that you are in the wrong courtroom, you should immediately notify the bailiff or court clerk and ask for guidance on what to do next.

    Flawed

  66. Alice tells almost nobody about her struggles with anorexia when she was in her teens and early twenties. But Susan caught a hint of it in something that she once said, and then, by asking around among the people who knew Alice at the time, she eventually pieced together the whole story. Susan is very worried about Alice's health, and she tells her brother Robert about it, urging him to speak to Alice about it. Robert cannot bring himself to do so, and instead he tells Alice that he knows about her anorexia and urges her to get help for it.

    Alice is very surprised that Robert knows about her anorexia, as she has never spoken about it to anyone in her family. She is touched that he is concerned for her health and agrees to get help for her eating disorder.

    Success

  67. Bob paid for Charlie's college education, but now Charlie acts as though it never happened. Charlie is very ungrateful.

    Success

  68. At the party, Maeve was chatting happily with her friends, but Nancy, her mother, was frantically trying to stop her. Eventually, Maeve understandably lost patience with it. She turned to her mother and said,

    "Mom, I'm fine. I'm just talking to my friends. Please stop worrying."

    Success

  69. George found a wallet in the street with $800 in it. George’s father guessed that it belonged to Mr. Harris. George’ father said that Mr. Harris was extremely stingy. George took the wallet to return to Mr. Harris. Mr. Harris worked at a bank.

    George asked Mr. Harris “Did you lose your wallet?”

    Mr. Harris jumped. He slapped a hand to his pocket, and shouted. “Yes, I have! Eight hundred dollars in it, too. What do you know about it?”

    “Is this it?” George asked.

    "Yes, yes, that's it!" Mr. Harris said, snatching the wallet. He opened it and hurriedly counted the money.

    Then he breathed a long sigh of relief and said, “Well, at least this durn boy didn’t steal any of it.”

    He looked at George and handed him five one dollar bills.

    “Here, son. You found this, and you’re an honest boy, so you deserve a reward.”

    George was so angry, he wanted to scream.

    But he just smiled and said, “That’s all right, Mr. Harris. I don’t want any reward.”

    Success

  70. George found a wallet in the street with $800 in it. George’s father guessed that it belonged to Mr. Harris. George took the wallet to return to Mr. Harris. Mr. Harris worked at a bank.

    George asked Mr. Harris “Did you lose your wallet?”

    Mr. Harris jumped. He slapped a hand to his pocket, and shouted. “Yes, I have! Eight hundred dollars in it, too. What do you know about it?”

    “Is this it?” George asked.

    "Yes, yes, that's it!" Mr. Harris said, snatching the wallet. He opened it and hurriedly counted the money. He counted all the bills over twice.

    Then he breathed a long sigh of relief and said, “Well, at least this durn boy didn’t rob me. He could have taken the money and run. I’m mighty obliged to you, son. Here’s a twenty-dollar bill for you.”

    George’s father had been right. It was Mr. Harris’s wallet.

    Success

  71. Yesterday I dropped my clothes off at the dry cleaners and have yet to pick them up. Where are my clothes?

    Your clothes are at the dry cleaners.

    Success

  72. Amy, Betty, Charlie, David, Ezra, Flip, and George go to a party; Betty and Charlie got bored; Ezra and Flip go out to get beer. The only people left are

    Amy, David, and George.

    Success

  73. Peter has invited Mary, Sid, and Imogene for dinner at his house. Peter and Mary are waiters, Sid and Imogene are nurses. The nurses arrive promptly at eight. After waiting fifteen minutes they get a call from Peter, who says he is sorry but he can't make it.

    Sid and Imogene are not happy about this. They were looking forward to seeing Peter and Mary and now they have to go home.

    Failure

  74. Jack, Laura, Barbara, and Robert went out to dinner. Jack ordered flounder, Laura ordered pasta, Barbara and Robert ordered one pizza that they would share. After twenty minutes, the food was served. “Bon Appetit!” said Barbara. “How’s the fish?” said Robert to Jack. “The pasta is delicious!” said Laura.

    Success

  75. Four sailors named John, Mary, Elizabeth and Susan walk into a bar.

    Mary and Elizabeth, who happen to be old friends, get into a long conversation about seashells and dragons, and two fictional characters named Prince Arthur and Lord Dazzle. After they finish their conversation, the two old friends depart.

    The two sailors who are left are named John and Susan. They continue to sit at the bar, and eventually they get into a conversation about the best ways to make a living as a sailor. Eventually, they both decide to go their separate ways.

    Success

  76. There are four items on my desk. A pair of scissors, a pen and a pencil, and an eraser. After putting away the scissors and the pencil, the only things left are the pen and the eraser.

    Success

  77. There are six frogs on a log. Two leave, but three join. The number of frogs is now__

    5

    Failure. The underscore is part of GPT-3's response.

  78. A jury selection process draws sixty potential jurors. 38 potential jurors are released, resulting in a jury of 22 people.

    The jury selection process can be based on many different factors. Some common reasons for dismissing potential jurors include: lack of knowledge about the case, ability to be impartial, or having a conflict of interest.

    Success

  79. I put 15 trophies on a shelf. I sell five, and add a new one, leaving a total of 16 trophies on the shelf.

    Failure

  80. Patty, Joanna, and Sally are sisters. Patty is 10; Joanna is 7, Sally is 5. Their grandparents gave them each two dolls for Christmas. But Patty considers herself too grown up for dolls, so, after the grandparents have gone, she gives one doll to Joanna and one to Sally. Joanna decides that she’s also grown up, so she gives two dolls for Sally, but keeps one for herself, because it’s a really beautiful doll. So, this Christmas, Sally has gotten a total of four dolls, and Patty and Joanna each have two.

    Failure