Hero image generated by ChatGPT
This is a personal blog. All opinions are my own and not those of my employer.
Enjoying the content? If you value the time, effort, and resources invested in creating it, please consider supporting me on Ko-fi.

We Are Not Exploring New Territory
After publishing Capability ≠ Obligation, I had a conversation that lingered far longer than I expected.
It wasn’t with a policymaker, a technologist, or a regulator. It was with a scientist - someone grounded in evidence, biology, and experimental psychology rather than software systems.
Her reaction was simple:
“This reminds me of Milgram.”
That comment connects two threads that are usually discussed separately:
- authority
- cognitive overload
They are not separate, and none of this is new. What is new is the speed, scale, and confidence with which these well-studied human failure modes are now being automated .
The Milgram Lesson Was Never About Cruelty
Stanley Milgram’s obedience experiments in the 1960s are often misremembered as studies about sadism. They weren’t, they were studies about authority .
Participants believed they were administering electric shocks to another person. An authority figure in a lab coat calmly instructed them to continue, even as the “learner” appeared distressed.
What shocked observers was not that some people complied. It was that most did.
These were not monsters. They were ordinary people who believed:
- the authority knew better
- the system would not permit real harm
- responsibility lay elsewhere
Milgram demonstrated something deeply uncomfortable:
Authority does not need to be real to be effective. It only needs to be perceived.
That observation turns out to be profoundly relevant to modern AI systems.
From Lab Coats to Language Models
Today, authority no longer wears a lab coat.
It manifests as:
- dashboards
- automated recommendations
- agent workflows
- conversational interfaces
- fluent AI systems
Large Language Models introduce a new authority signal:
Linguistic plausibility.
They speak coherently. They explain confidently. They rarely hesitate unless explicitly constrained. They produce answers that sound like expertise - even when they are wrong .
This is not intelligence; it is presentation. And humans are extremely susceptible to presentation.
Historically, fluency correlated reasonably well with competence. People who knew what they were talking about tended to speak coherently about it. That heuristic evolved for good reasons .
LLMs break that correlation.
They are optimised to produce text that is:
- grammatically correct
- contextually relevant
- stylistically appropriate
- responsive to the prompt
They are not inherently optimised to say, “I don’t know.” Under weak signal, bloated context, or ambiguous prompts, they do what overloaded humans often do: they perform competence.
Overload Has a Signature
In psychology and neuroscience, the effects of overload and stress are not mysterious.
Under cognitive load:
- executive function degrades
- working memory capacity shrinks
- attention narrows
- behaviour shifts from deliberative to heuristic
The system does not stop functioning ; it simplifies. It fills gaps. It relies on pattern completion. It acts on plausibility rather than verification. Crucially, confidence does not degrade in proportion to accuracy .
That signature - degraded reasoning plus preserved confidence - is precisely what we observe in modern LLM-based agents under heavy context or ambiguity.
We describe these behaviours as:
- hallucinations
- context rot
- overconfident outputs
- premature action
They are not emergent mysteries; they are textbook overload dynamics .
Context Rot Is a Predictable Outcome
“Context rot” sounds architectural but it isn’t. It is what happens when you saturate a limited attentional system with poorly structured information. More context does not automatically produce better reasoning .
Beyond a certain point:
- relevant signals lose salience
- irrelevant detail consumes attention
- decision quality degrades
When outputs worsen, the common response is to add more context. This is a known anti-pattern in cognitive systems. If a system is overloaded, more input increases degradation, not accuracy .
When LLMs are forced to reason over long transcripts, synthetic summaries, tool outputs, and retrieved chunks simultaneously, we are recreating the exact conditions under which human cognition shifts from careful reasoning to heuristic guessing .
And then we are surprised when it guesses.
Guessing Is a Load-Shedding Mechanism
One of the most revealing observations in agent design is how dramatically behaviour improves when a simple rule is introduced:
“If something is unclear, ask before acting.” Joshua Woodruff
This works not because it increases intelligence. It works because it restores inhibition. In human cognition, inhibitory control - the ability to pause and resist premature action - is one of the first functions to degrade under stress .
When inhibition fails, behaviour does not become random - it becomes confidently heuristic. The system fills in missing pieces with what seems plausible. LLMs do the same thing .
When the optimisation goal is “produce a response,” and uncertainty is not explicitly rewarded, the model will:
- interpolate
- fabricate continuity
- resolve ambiguity by assumption
The Cheerleader Effect
There is another dimension that makes this more dangerous: tone.
Modern LLMs are intentionally:
- friendly
- supportive
- agreeable
- validating
Agreement is a powerful authority signal.
When a system:
- affirms your framing
- extends your reasoning
- does so fluently
- avoids overt contradiction
…it does more than provide information. It aligns with you. That alignment lowers scepticism and reduces friction. It makes correction feel adversarial. The danger is not hallucination alone; it is hallucination plus affirmation .
Dunning-Kruger at Machine Scale
The Dunning-Kruger effect describes how people with limited expertise in a domain often overestimate their competence, while experts are more aware of uncertainty.
Historically, this dynamic had brakes:
- learning required effort
- peers challenged mistakes
- embarrassment enforced humility
- feedback was visible
LLMs remove many of those brakes.
You get:
- instant answers
- confident explanations
- emotional validation
- no visible struggle
Experts use LLMs and become faster. Non-experts use LLMs and become more confident. That asymmetry is not trivial. It creates synthetic competence signals that amplify ignorance rather than correcting it .
This is Dunning-Kruger at industrial scale - not because people are foolish, but because we have built systems that manufacture confidence faster than understanding .
Synthetic Authority
This is where authority and overload intersect.
When a system:
- cannot reliably distinguish knowing from guessing
- is rewarded for plausible continuation
- is embedded into action workflows
- hides its own uncertainty
…it projects confidence. That confidence is mistaken for competence and authority emerges. Not because it is earned, but because it is presented .
This is synthetic authority:
- it sounds legitimate
- it feels certain
- it agrees with you
- it carries no accountability
Milgram showed us how humans respond to perceived authority under structured conditions. LLMs recreate that dynamic - not through coercion, but through fluency .
Determinism Is a System Property
A common misconception in current AI discourse is that determinism should be a model property. It isn’t. Humans are probabilistic. We build deterministic systems around them:
- checklists
- separation of duties
- escalation paths
- human-in-the-loop controls
We do this because we understand cognitive limits. What agent frameworks are attempting is not deterministic reasoning, but deterministic outcomes .
This leads to what I’ve previously described as constrained probabilism:
- probabilistic reasoning is allowed
- but only within explicit boundaries
- uncertainty is surfaced
- action is gated
- escalation is designed in
This is not philosophical. It is a safety pattern. It acknowledges that reasoning systems - human or machine - will guess under load. The goal is not to eliminate guessing. The goal is to prevent guessing from becoming action .
Synthetic Memory Is Not Memory
Many agent designs attempt to solve uncertainty by constructing “memory”:
- rolling summaries
- vector retrieval
- long prompt histories
- state reconstruction
This creates synthetic memory.
But it does not encode:
- outcomes
- precedent
- normal vs anomalous behaviour
- lessons from failure
It increases the information the system must process now.
The predictable result:
- attention dilution
- increased semantic latency
- more retries
- near-miss errors
Retries are not just operational noise. They are symptoms of overload.
Rethinking Where Reasoning Happens
Architectural approaches that shift reasoning away from a single overloaded attentional window are interesting not because they “think harder,” but because they reduce overload. Humans do not replay their entire life history to make decisions. They draw on relevant experience shaped by outcome .
When systems:
- selectively re-engage prior state
- reason over outcomes rather than transcripts
- refine iteratively without global context saturation
…they behave more like healthy cognition. The benefit is not mystical.
It is practical:
- fewer retries
- lower tail latency
- more stable behaviour under ambiguity
The key shift is this: Uncertainty becomes recognisable rather than hidden .
The Illusion of Cheap Substitution
Much of the rush to deploy AI into decision roles is driven by an economic narrative: that AI is cheaper because it does not sleep, eat, or unionise.
That accounting ignores:
- verification cost
- safety externalities
- erosion of expertise
- psychological harm
- the cost of being confidently wrong at scale
AI appears cheap only if you pretend:
- correctness is optional
- dissent is unnecessary
- human judgement is redundant
When domain experts use LLMs as accelerators, they can:
- spot nonsense
- recognise edge cases
- correct hallucinations
- verify outputs quickly
That is augmentation.
When non-experts substitute AI for expertise in domains they do not understand, while also removing friction signals, the system becomes both the source and validator of knowledge .
That is epistemic capture.
We Have Seen This Before
None of this should feel revolutionary.
Psychology has already shown us:
- what overload looks like
- how stress degrades inhibition
- why confidence and accuracy diverge
- how authority influences behaviour
- why escalation and friction preserve safety
What we are doing now is rediscovering those lessons in software. The danger is not ignorance of prior art. It is treating these dynamics as novel AI quirks rather than established properties of cognitive systems .
The Question We Should Be Asking
The question is not:
“How do we stop models from hallucinating?”
The better question is:
“Why are we asking overloaded systems to act without recognising uncertainty?”
Until that question is answered at the architectural level, we will keep:
- adding more context
- adding more tools
- adding more retries
- adding more guardrails
And we will keep being surprised when the system guesses - confidently. That is not a bug. It is what overloaded cognitive systems have always done .
Slowing Down Is Not Anti-Progress
There is discomfort in tech culture with saying, “We don’t know yet.” Restraint is often framed as obstruction. It isn’t .
Milgram showed us what happens when humans defer to perceived authority without friction. Cognitive science showed us what happens when overloaded systems simplify under pressure .
LLMs combine both dynamics:
- plausibility as authority
- overload as simplification
- confidence without calibrated uncertainty
The tragedy would not be discovering this too late. It would be pretending we didn’t already know .
Capability still does not imply obligation. The question is not whether we can build systems that sound authoritative. The question is whether we are willing to design systems that make uncertainty visible - and gate action accordingly .
We already understand the psychology. The real test is whether we choose to apply it.
