The Limits of LLMs

27 May 2025

This is largely in response to Kaj Sotala's Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI.

My first thought is that perhaps Sotala provides the first concrete mechanism by which we might see an LLM stall-out. Namely: given that these models are being extensively fine-tuned on software engineering (not just swebench), these logic errors (not all related to software) seem to indicate an issue, not with LLMs are a modality per se, but as a critique of how the transformer archetecture can be fine-tuned to achieve general reasoning.

Sotala's analysis is focused more on templates and stereotyping, but I'd like to take a stab at what is going on somewhat more precisely. There is an information problem inherit in how small details can have arbitrarily large impact. Consider an LLM benchmark where it needs to solve a murder mystery. It seems transparently obvious that the number of inferential steps to the solution it can accurately achieve is upper bounded by the depth of the attention system, and likely much, much lower, as many of those layers require Winograd-style attention problems of thier own. If someone puts keys in their jacket pocket, puts their jacket on the chair, stands up, walks across the room, and checks their pocket, I think you could make a solid case that Winograd-type problems related to what is in their pocket require at least 6 layers to correctly solve.

But attention works over tokens -- one could point out that some inferences of much shorter depth (the proverbial dogs that don't bark) require attention be paid to the absense of a comment, or over imagined world states more generally. It seems relatively obvious that imagining the world is a critical part of our own reasoning, and the lack of first-class capability to do this in current archetectures might well be an issue.

I do not recommend people rush to solve this problem only to further speed up progress, I just wanted to make clear the mental model involved in the lower-bound progress estimates I'm making. The upper bound is somewhere around AI 2027 (I am aware they see this as a median guess).