
You can listen to audio version of the article above.
Despite OpenAI CEO Sam Altman’s assertions about the company being close to achieving artificial general intelligence (AGI), a recent test of their most advanced publicly available AI has exposed a notable flaw.
As Gary Smith, a senior fellow at the Walter Bradley Center for Natural and Artificial Intelligence, explains in *Mind Matters*, OpenAI’s “o1” reasoning model struggled significantly with the *New York Times* Connections word game.
This game challenges players with 16 words, tasking them with finding connections between them to form groups of four. These connections can range from simple categories like “book subtitles” to more complex and less obvious ones, such as “words that start with fire,” making it a rather demanding exercise in lateral thinking.
Smith tested o1, along with comparable large language models (LLMs) from Google, Anthropic, and Microsoft (which utilizes OpenAI’s technology), using a daily Connections puzzle.
The results were quite surprising, especially given the widespread hype surrounding AI advancements. All the models performed poorly, but o1, which has been heavily touted as a major breakthrough for OpenAI, fared particularly badly. This test indicates that even this supposedly cutting-edge system struggles with the relatively simple task of solving a word association game.
When presented with that day’s Connections challenge, o1 did manage to identify some correct groupings, to its credit. However, Smith observed that its other suggested combinations were “bizarre,” bordering on nonsensical.
Smith aptly characterized o1’s performance as offering “many puzzling groupings” alongside a “few valid connections.” This highlights a recurring weakness in current AI: while it can often appear impressive when recalling and processing information it has been trained on, it encounters significant difficulties when confronted with novel and unfamiliar problems.
Essentially, if OpenAI is genuinely on the cusp of achieving artificial general intelligence (AGI), or has even made preliminary progress towards it, as suggested by one of their employees last year, they are certainly not demonstrating it effectively. This specific test provides clear evidence that the current iteration of their technology is not yet capable of the kind of flexible reasoning that characterizes true general intelligence.