Benchmarks for Commonsense Reasoning: Simulated Physical Worlds

AnimalAI

Description: A 3D-simulated arena with a simple simulated physics and a set of objects that can be combined to build the kinds of enviroments and apparatus found in animal experiments. The environment and objects are designed to be as simple as possible whilst still maintaining the possibility to build a wide range of tasks.
Size: 900 problems in 12 categories.
Website
Paper

Examples:

******************************************************

COWP (Commonsense Open World Planning) dataset

(Note: COWP is actually the name of the AI system rather than the associated dataset. I have trouble following the description of the dataset in thie paper.)

Description: An instance is "an execution-time situation collected from a dining domain using a crowd-sourcing platform ... correspond[ing] to an instance of a robot not being able to perform a plan (that normally works)".

Example:

An illustrative example of COWP for open-world planning, where the robot was tasked with “delivering a cup for drinking water.” (a) The robot walked to a cabinet, and located a cup on the cabinet. However, the robot found a situation that there were objects in the cup (a knife, a fork, and a spoon in this case). This observation was entered into the plan monitor, which queried GPT-3, and suggested that the planned action “grasp” was not applicable given the occupied cup. Accordingly, COWP updated its task planner by adding the new information that one cannot pour water into a non-empty cup. (b) The robot reasoned about other objects that were available in the environment, and queried GPT-3 to update the task planner about whether those objects can be used for drinking water – details in Section III. It happened that the robot learned a bowl could be used for drinking water. (c) A new plan of delivering a bowl to the human for drinking water was generated. Following the new plan, the robot walked to the table on which a bowl was located. (d) The robot grasped the bowl after observing it using vision. (e) The robot navigated to the dining table with the bowl. (f) The robot put down the bowl onto the dining table, and explained that a bowl was served due to the cup being occupied, which concluded the planning and execution processes.

Size:561 execution tasks.
Paper
Github
Created using: Crowd sourcing.

********************************************************

Housekeep

Description: A benchmark to evaluate commonsense reasoning in the home for embodied AI.

Size: 1799 objects, 268 object categories, 585 placements, and 105 rooms

Github:

Paper

Example:
********************************************************

OGRE

Description: A collection of physics puzzles in a two-dimensional simulated world of mechanical interactions. An extension of PHYRE (see below) to test object generalization.

Size: 37 task templates, each defining a set of 100 tasks.
Paper
Github

Examples:

********************************************************

OPEn: An Open-Ended Physics Environment for Learning without a Task

Description: A simulated 3D physical environment with fairly photo-realistic images and high-fidelity physical simulation.

Size:400 tasks. (Four categories, 100 tasks in each)
Paper
Web site

Examples:

********************************************************

PHYRE

Description: A collection of physics puzzles in a two-dimensional simulated world of mechanical interactions.

Size: 5,000 tasks. (2 tiers, 25 templates in each, 100 tasks in each template.

Paper
Web site

Examples:

********************************************************

RoomR

Description: A simulated environment. An agent begins by exploring a room and recording objects' initial configurations. We then remove the agent and change the poses and states (e.g., open/closed) of some objects in the room. The agent must restore the initial configurations of all objects in the room.

Example:

Size: 6,000 distinct rearrangement settings involving 72 different object types in 120 scenes. 1895 pickupable object instances and 1262 openable non-pickupable
Paper
Web site
Created using: Automatic synthesis of examples in the AI2-THOR virtual environment.

********************************************************

ThreeDWorld

Description: ThreeDWorld is a platform for interactive multi-modal physical simulation. With TDW, users can simulate high-fidelity sensory data and physical interactions between mobile agents and objects in a wide variety of rich 3D environments. TDW has several unique properties: 1) realtime near photo-realistic image rendering quality; 2) a library of objects and environments with materials for high-quality rendering, and routines enabling user customization of the asset library; 3) generative procedures for efficiently building classes of new environments 4) high-fidelity audio rendering; 5) believable and realistic physical interactions for a wide variety of material types, including cloths, liquid, and deformable objects; 6) a range of "avatar" types that serve as embodiments of AI agents, with the option for user avatar customization; and 7) support for human interactions with VR devices.
Paper
Web site

Examples:

********************************************************

Virtual Tools

Description: A simulated, simple two-dimensional environment (horizontal plus vertical). Users play a "game" in which they are required to use a tool in order to get a red object into a green region.

Paper:
Web site
Constructed using: Expert construction.

Examples:

Twenty levels used in the Virtual Tools game. Players choose one of three tools (shown to the right of each level) to place in the scene in order to get a red object into the green goal area. Black objects are fixed, while blue objects also move; grey regions are prohibited for tool placement. Levels denoted with A/B labels are matched pairs.