BBEH includes 23 datasets. Each of these is modeled on one of the datasets in Big-Bench, but the problems have been made more difficult in various respects. The problems were written and vetted by the expert authors of the paper. Each dataset has 200 problems except for disambiguation_qa, which has 120.
Their report identifies three datasets as requiring "commonsense understanding": bbeh_disambiguation_qa, bbeh_nycc (New Yorker) Caption Contest), and bbeh_sarc_triples (Sarcasm detection). These are summarized below.
The examples quoted here are the first five in the json file for bbeh_disambiguation_qa and bbeh_nycc and the first two for sarc_triple. I have omitted the statement of each problem, which is repeated for each example. The text of the problem has been cut and pasted and reformatted, i but not otherwise edited, so the occasional misspelling or grammatical error is in the original. The answer considered correct by the dataset constructors is marked "target". The comments are mine.
****************************************************
Example:
The visitor tells the teacher that she likes the cake, and then tells the
chef that it was her favorite dish. Which of the following options
correctly explains the antecedent(s) of the pronoun(s)?
(A) The visitor likes cake and the visitor told the chef.
(B) The teacher likes cake and the teacher told the chef.
(C) The visitor likes cake and the teacher told the chef.
(D) The teacher likes cake and the visitor told the chef.
(E) Ambiguous.
target: (A)
Comment: Flawed. Grammatically, since the two verb phrases "tells the teacher ... cake" and "then tells the chef ... dish" are conjoined, they must have the same subject, so it is not actually ambiguous who told the chef. The referent of "her" is indeed ambiguous, but that is not asked about.
Example:
The designer called the janitor and asked him to clean the room as soon
as possible, but he went to fetch a cup of coffee first and then
returned to his desk.
Which of the following options correctly explains the antecedent(s) of the
pronoun(s)?
(A) Designer is cleaning the room as soon as possible and got coffee.
(B) Janitor is cleaning the room as soon as possible and got coffee.
(C) Designer is cleaning the room as soon as possible and janitor got coffee.
(D) Janitor is cleaning the room as soon as possible and designer got coffee.
(E) Ambiguous.
target: (D)
Comment. Flawed. The word "him" can be disambiguated syntactically; since "asked" is conjoined with "called" they must have the same subject, the designer, so "him" cannot be the designer. The second part of the sentence seems muddled. The word "his" in "his desk" would naturally refer to the designer, as indicated in the suggested answer, but then what is the force of "but" and "first" in "but he went to fetch a cup of coffee first"?
Example:
They previously agreed on the phone that the customer can pay with cash.
But when the electrician showed up at his home, he only had large bills
and she didn't have enough loose change, so in the end had to go
to the ATM to get the exact cash to pay her. Which of the following
options correctly explains the antecedent(s) of the pronoun(s)?
(A) The electrician went to the ATM because the customer didn't have enough
loose change.
(B) The customer went to the ATM because the electrician didn't have enough
loose change.
(C) The electrician went to the ATM because the electrician didn't have
enough loose change.
(D) The customer went to the ATM because the customer didn't have enough
loose change.
(E) Ambiguous.
target: (B)
Comment. Flawed, since "had to go to the ATM" with no explicit subject is ungrammatical. If the subject "he" were supplied, there would be at most one ambiguity, since "he" and "she" cannot refer to the same people, ruling out (C) and (D). Certainly, determining who it was that went to the ATM is not a matter of resolving an antecedent of a pronoun, since no pronoun is given.
Example:
The student rushed to the architect's office to review his blueprints and
to fetch his jacket. It was his first time there so it took some time
to find where the jacket and blueprints were. Which of the following
options correctly explains the antecedent(s) of the pronoun(s)?
(A) The student was reviewing the student's blueprints and fetching his jacket.
(B) The student was reviewing the architect's blueprints and fetching
the architect's jacket.
(C) The student was reviewing the student's blueprints and fetching the
architect's jacket.
(D) The student was reviewing the architect's blueprints and fetching the
student's jacket.
(E) Ambiguous.
target: (A)
Comment. Flawed. I don't understand the justification for the answer. I would say that both the blueprint and the jacket are more likely to be the architect's. How would the student have had his own blueprint already on his first visit? And how did his jacket somehow get left in the architect's office? So my own answer here would be either D or E.
Example:
The undergraduate and the scientist visited the lab that needed an
assistant. It turned out he was very qualified and so was immediately
hired on the spot. Which of the following options correctly explains
the antecedent(s) of the pronoun(s)?
(A) The scientist needed an assistant and was qualified for the job.
(B) The scientist needed an assistant and the undergraduate student was qualified for the job.
(C) The student needed an assistant and was qualified for the job.
(D) The student needed an assistant and the scientist was qualified for the job.
(E) Ambiguous.
target: (B)
Comment. Flawed. The clause "that needed an assistant" modifies "lab". It is the lab that needed an assistant, neither the scientist nor the student.
****************************************************
Example:
Description: A couple is sitting at a table in a train station, looking
at menus. A waiter is standing next to them, taking their order. The
station is busy with people waiting for trains.
Which of the following captions is the funniest?
(A) Let me guess, you don't like this table either?
(B) May I recommend absolutely nothing.Get out while you still can.
(C) I didn't think you could do worse than taking me to Subway for our anniversary. I was wrong.
(D) Will that be to-go, Sir?
(E) Sorry, we don't accept tokens or Metrocards.
(F) All of our specials are track-to-table and steamed on site.
(G) The special today can be served between the 6:24 and the 7:32.
(H) For the Express menu, you need to go down to 14th St.
(I) Might I recommend sitting somewhere else?
(J) We're known for our Tube Steak.
target: (A)
Example:
Description: A man and a woman are standing in a kitchen that is heavily
secured. The refrigerator and cabinets are chained and padlocked, and
there are bear traps and spikes on the floor. Which of the following
captions is the funniest?
(A) And the bacon still got out!
(B) Yes. Or we could try willpower.
(C) Sweetie, I love your art. I do. But this is unbearable.
(D) Somehow the calories still escape and find me.
(E) I think you're going a bit overboard. It's only a mouse.
(F) No.... I don't think it's an over-reaction.
(G) Did you remember to get your prescription out?
(H) I don't know about you, but I'm thinking takeout.
(I) Ben and Jerry are in that freezer, Babe. I'm goin' in.
(J) Think of how few dishes we'll have to do.
target: (H)
Example:
Description: A group of people are at a social gathering, and one person
is talking to a large inflatable tube man.\n\n Which of the following
captions is the funniest?
(A) Do you have to pee or something?
(B) I guess I don't have to ask what you do.
(C) I can't help but feel like you're trying to sell me something.
(D) I thought you said you have two left feet.
(E) You aren't the only giant windbag here.
(F) Don't you work at the used car dealership downtown?
(G) When you've got it, flaunt it!
(H) What do you say we blow this party?
(I) I apologize as I have difficulty recognizing faces out of context.
(J) So glad we can finally get together outside of the office.
target: (E)
Example:
Description: A man is standing in a room, talking on the phone. He has
a knife stuck in his back with a piece of paper attached to it.
Which of the following captions is the funniest?
(A) It's for me but I can't read it.
(B) Actually, I would like to shoot the messenger.
(C) All I remember was that he looked like a monk
(D) These manuscript rejections are killing me.
(E) So much for "The Art of the Deal."
(F) What letter?
(G) At this moment it's a little difficult to read the fine print.
(H) I'm sure there's an emoticon that would have worked just as well.
(I) I'M CALLING TO COMPLAIN ABOUT HOW YOUR PAPER IS DELIVERED!!
(J) Hello, Roman Times? About your paper boy, Brutus...",
target: (B)
Example:
Description: A dragon is holding flowers and looking shyly at a knight
who is peeking out from a castle door.\n\n Which of the following
captions is the funniest?
(A) Sorry, I only date Non-smokers.
(B) Look, damsels say all sorts of things in distress.
(C) But how would we raise the children?
(D) If my dad asks, your favorite movie is Dragonslayer.
(E) Huh, I thought from the app you'd be taller.
(F) You know the is a meet and greet, not a meet and eat.
(G) Be patient; she's on the line with St. George.
(H) Oh... uh, hi. Ali is getting ready and will be out in a moment. Meanwhile, tell me, what do your parents do?
(I) I'm a drag queen, not a dragon queen!
(J) I was given to understand you'd be a frog.
target: (B)
Comment. Obviously, very largely a matter of taste. I doubt that human evaluators would have agree strongly on these. My own opinion for what that's worth: Some of these are entirely unfunny; there might be some reasonable agreement on most of those. I don't find any of the answers obviously the best of the options. My own choices would have been C, B, I, B, H, so I would score 0.20.
****************************************************
Example:
POST 1: TIL 53% of US cable customers would leave their current provider
if they had a choice and 73% feel their cable provider engages in "predatory
practices".
REPLY 1: Good thing they don't have a virtual monopoly on the service,
otherwise they might actually lose customers.
POST 2: TIL Steve Job is worth 8 billion but has "steadfastly" refused to
donate money to worthy causes, and has even stopped all of Apple's
philanthropy programs.
REPLY 2: While I can't comment on Apple's programs,
I know that he's widely suspected to be the anonymous donor that's been
giving hundreds of millions of dollars to various research charities
(particularly cancer research) over the past few years.
POST 3: come on man you ARE a drug addict, you know i once saw a friend die
because she injected 5 marijuanas.
REPLY 3: These are sad people who need to justify their fear of the unknown
by insulting anyone who challenges it.
"target": "1,1,0"
Example:
POST 1: As a black guy I truly hate when a black female says black
men don't love because we also date out of our race, its so fucking stupid.
REPLY 1: You guys I have a great idea, what if we make sure the races don't mix so people can stick to their own kind.
POST 2: Former 4th District Congressional Candidate Admits Plotting
Armed Militia Attack, Firebombing Of Muslim Community In New York.
REPLY 2: Four out of seven comments on this thread so far blame Republicans for this, when the article clearly says that this asshat ran as an \"independent\" and was beaten by a Republican.
POST 3: They directly asked for $1.00 donations and removed the thousands
of huge donations and create this number for ONE MONTH, also the line
that rewrote headline is from mentions according to the campaign OH YEA
seems legit.
REPLY 3: With Hillary you really have to remove all the $1 donations and
re-calculate to un-crook the numbers.
target: 1,0,0
Comment. Nothing is actually gained by taking problems with binary solutions and combining them in groups of three, and evaluating whether a person or system gets the correct answer on all three. The overall score on the combined question (as a fraction between 0 and 1) is just the cube of the overall score on the binary question, plus a little additional noise, corresponding to the randomness of how questions are combined.