Apple report says powerful AI models aren't reasoning at all – Matthew Griffin

By Matthew Griffin Intelligence and the Senses 16th June 2025

WHY THIS MATTERS IN BRIEF

Reaching AGI largely depends on AI being able to reason in the same way as humans – but according to Apple AI models are still brute forcing pattern recognition instead.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

Apple – the company behind the notoriously awful Siri Artificial Intelligence (AI) bot – has claimed that new-age AI reasoning models might not be as smart as they have been made out to be. In a study titled, The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, the tech giant controversially claimed that reasoning models like Claude, DeepSeek-R1, and o3-mini do not actually reason at all.

SolarCity and Tesla are powering an entire island with solar

Apple claimed that these models simply memorise patterns really well, but when the questions are altered or the complexity increased, they collapse altogether. In simple terms, the models work great when they are able to match patterns, but once patterns become too complex, they fall away.

“Through extensive experimentation across diverse puzzles, we show that frontier Large Reasoning Models (LRMs) face a complete accuracy collapse beyond certain complexities,” the study highlighted.

“Moreover, they exhibit a counter-intuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budget,” it added.

For the study, the researchers flipped the script on the type of questions that reasoning models usually answer. Instead of the same old math tests, the models were presented with cleverly constructed puzzle games such as Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World.

GPT3 helps transform chemical research

Each puzzle had simple, well-defined rules, and as the complexity was increased (like more disks, more blocks, more actors), the models needed to plan deeper and reason longer. The findings revealed three regimes: low complexity – regular models actually win, medium complexity – thinking models show some advantage, and high complexity – everything breaks down completely.

Apple reasoned that if the reasoning models were truly ‘reasoning’, they would be able to get better with more computing power and clear instructions. However, they started hitting walls and gave up, even when provided solutions.

“When we provided the solution algorithm for the Tower of Hanoi to the models, their performance on this puzzle did not improve,” the study stated, adding: “Moreover, investigating the first failure move of the models revealed surprising behaviours. For instance, they could perform up to 100 correct moves in the Tower of Hanoi but fail to provide more than 5 correct moves in the River Crossing puzzle.”

Amazon trained their delivery bot Scout in the virtual world so it could handle the real one

With talks surrounding human-level AI arriving as soon as next year, popularly referred to as Artificial General Intelligence (AGI), Apple’s study suggests that we might be further away from realising AGI than the big giants in the industry say … so we’ll see who’s right.

Matthew Griffin / About Author

Described by NASA as a "Walking Encyclopaedia of the Future" and one of the world’s most in demand keynote speakers Matthew Griffin, Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm, is a multi-award winning Futurist and strategic foresight expert, author, lecturer, mentor, podcast and YouTube host.

With a distinguished career running global business units at Atos, EMC, and IBM today Matthew helps leaders and titans of industry from all over the world explore the impact of the inter sections of technology, culture, and leadership, then helps them create high performance organisations that can continuously disrupt, outpace, and outthink their competition.

15 times author of the hit series the Codex of the Future series Matthew's ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders across the G7, G20, and G77, and many of the world's most recognised and exceptional organisations including ABB, Accenture, ABN Amro, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Deloitte, Dentons, Disney, Dow, EY, Google, KPMG, Lego, Microsoft, Legal & General, LinkedIn, Nokia, Paramount, PepsiCo, PWC, Qualcomm, RWE, Samsung, T-Mobile, UBS, Visa, WIRED, WPP and hundreds of others.

Matthew is regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, and his mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.