Don’t Ride This Bike! Generative AI’s persistent trouble with
compositionality and parts, Gary Marcus and Ernest Davis,
December 8, 2024.
Complete results.
Testing GPT-4-o1-preview on math and science problems: A follow-up study October 2024
ChatGPT: Experiments in analyzing and generating meter and rhyme. April 2024.
Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems. Ernest Davis and Scott Aaronson. arXiv 2308.05713. August 2023. Additional Material
DALL-E is really lousy at object parts and body parts October 2022.
Some more GPT-3 Experiments. June 2022.
Experiments in Commonsense Reasoning in GPT-3: Status Report from June 2022 Ernest Davis and Gary Marcus, June 2022.
A very preliminary analysis of DALL-E 2, by Gary Marcus, Ernest Davis, and Scott Aaronson. arXiv 2205.13807. April 2022.
Experiments testing GPT-3's ability at commonsense reasoning: results. by Gary Marcus and Ernest Davis, August 2020.
Winograd Schemas and Machine Translation: Some Examples January 2020.
Google Translate fails on simple sentences October 2016 with subsequent updates.