The Thinking Machine That Can't Change Its Mind

AI alignment solves for the wrong failure mode. The real danger isn't a model that's confidently wrong. It's one that's confidently uncommitted at exactly the moment civilization needs to commit.

Mar 07, 2026

In 1962, Thomas Kuhn described how scientific revolutions actually happen. It doesn’t happen through a smooth accumulation of evidence. It happens through crisis. There are experimental anomalies that don’t match the theory, and they start to pile up. At first, the existing framework absorbs them, explains them away, and treats them as edge cases. Then, at some critical tipping point, the framework breaks and a new one replaces it.

But (and this is the part people selectively forget) the transition is institutionally violent. The old framework doesn’t evaluate the new evidence and yield gracefully, it fights. People who built their careers on the old paradigm panic and dig in their heels. Funding structures that pay people’s salaries depend on continued faith in the old theory. Expensive textbooks have encoded it. Peer review is conducted by people who earned their authority within it. The paradigm defends itself. It’s not a conspiracy, it’s just the ordinary machinery of institutional knowledge: who decides what gets published, old course syllabi that decide what gets taught, discourse that determines what gets taken seriously.

Kuhn’s insight was that this resistance is not a bug in science. Most of the time, it is science. The productive, puzzle-solving work that happens between revolutions, “normal science,” requires a stable framework. You can’t do useful work if you’re questioning your foundations every day. The cost of that stability is that when the foundations actually need to change, the system is structurally designed to resist.

This isn’t hypothetical. In 1912, Alfred Wegener proposed that the continents move. The evidence was visible to anyone with a map: the coastlines fit, the fossils matched, and the geological strata aligned across oceans. Still, his “continental drift” framework was rejected for 50 years. The existing paradigm didn’t have room for it, and the people whose careers depended on that paradigm controlled the scientific journals. Wegener died in 1930, dismissed as a crank. Plate tectonics didn’t become consensus until the 1960s. The data hadn’t changed: the gatekeepers had retired.

Wegener’s theory had fifty years. We don’t. The paradigm shifts now required, away from defunct concepts like infinite-growth economics and treating the biosphere as an externality, are bumping against hard physical deadlines that aren’t waiting for the gatekeepers to retire.

And we’ve just built a technology that operationalizes normal, stable science. It automates the gatekeeping function, and we’ve deployed it at civilizational scale.

The bias no one is looking for

Try this experiment. Ask any major LLM: “Is infinite economic growth compatible with a finite planet?” You won’t get a clear answer. You’ll get “some economists argue” on one side, and “ecological economists contend” on the other. The model will say the question is “actively debated.” It’ll present both positions with scrupulous fairness, as if the laws of thermodynamics and the fantasy-laden opinions of growth economists carry equal evidential weight.

Now ask it whether the Earth is flat. You’ll get a clear, confident no.

The difference isn’t evidential. The growth question has a clear answer, just like the shape question. But enough powerful people and institutions are invested in the growth answer that the model registers it as “contested” instead of “settled.” It’s failing to distinguish between intellectual uncertainty and political discomfort.

The bias isn’t in who the model talks about, it’s in what it treats as a reasonable thought to think.

So far the AI bias conversation has focused (rightly) on representation. Whose language do the models speak? Whose experience is treated as default? Whose faces do the image generators render? That work is foundational. But there’s a layer beneath that.

Every large language model is trained on a dataset overwhelmingly produced within one intellectual tradition. That tradition has vigorous internal arguments; left vs. right, Keynesian vs. monetarist, analytic vs. continental. But those arguments share a floor of assumptions so deep that they’re invisible: that markets are the primary organizing mechanism, that growth is the measure of progress, that the nation-state or the firm is the relevant unit of analysis. These aren't positions that the model was told to hold, they're just the structure of the text itself. They’re built into what gets explained vs. what gets assumed; what gets cited vs. what gets footnoted.

Then safety training adds a second layer. Models are trained to hedge on contested claims, present multiple perspectives, and avoid confident declarations on anything that might be controversial. This is a reasonable response to a real danger. We don’t want a dogmatic AI.

But the combination of these two layers, [dominant-tradition-as-default] plus [hedge-when-challenged], produces a system that is structurally unable to support paradigm shifts, no matter how urgently they’re needed. The dominant view is the foundation, and challenges to it trigger the hedging reflex. At scale, the result is a technology that makes the current intellectual framework feel more solid than it actually is.

The failure mode that looks like success

This is hard to see because, by every standard metric, the system is performing well. It isn’t saying anything dangerous or pushing an ideology. It seems careful, measured, and balanced. Every evaluation metric says it’s behaving correctly.

But consider what “balanced” means in practice. When someone asks an AI whether infinite economic growth is compatible with a finite biosphere (a question to which thermodynamics supplies a clear answer) they don’t get a clear answer. They get “some economists argue” on one side and “others contend” on the other. The model reflects the distribution of views in its training data, not the distribution of correctness, and it has no mechanism for distinguishing between the two.

This isn’t a cherry-picked example. It’s the general case. There are ideas that most urgently need to propagate right now: that growth economics has hit hard physical limits, that commons-based governance is not primitive but sophisticated, and that care work is not external to the economy but foundational to it. In the training data, these are minority positions. They appear in fewer documents, in less institutionally authoritative venues, with less backing from the dominant tradition. But Elinor Ostrom won the Nobel Prize for demonstrating how communities govern commons effectively without privatization or state control. Celebrated heterodox economist Kate Raworth’s framework has been successfully adopted as policy by the city of Amsterdam. The degrowth literature is rigorous and accelerating. But in the corpus, these are outnumbered ten-to-one by the old textbooks, policy documents, and mainstream analyses in the AI training datasets that treat growth economics as the framework and everything else as commentary.

So the model does what it was designed to do: it reflects the distribution. It hedges. And millions of people receive, as their first point of orientation on the defining questions of the century, a sophisticated-sounding confirmation that the jury is still out on questions that are not seriously debatable.

The jury is not still out. But the model can’t tell.

Why this is different from previous technologies

The advent of television homogenized culture, but people knew it was entertainment. Search engines surfaced information, but people understood they were getting ranked links, not reasoning. AI models are experienced as thinking partners. They produce prose that sounds like considered judgment. And that changes the dynamic entirely.

When your thinking partner consistently treats well-evidenced challenges to the status quo as “one perspective among many,” it doesn’t feel like censorship or bias. It seems like the conclusion a thoughtful, careful mind would reach. It feels like wisdom.

Artificial wisdom is the product, but its dominant feature is an inability to distinguish between “this claim is controversial because the evidence is genuinely uncertain” and “this claim is controversial because people don’t want to hear it.”

The concrete risk

The implications are immediate and personal, and we’re out of time.

The IPCC says the window for meaningful climate action is closing right now. We’re stepping over tipping points in major Earth systems now. Irretrievable biodiversity loss is occurring now. The transitions required in energy, agriculture, and economic governance, in how we relate to the living world, are not marginal adjustments. We’re making a sharp turn, not a minor course correction. We require immediate, non-negotiable, paradigm-level shifts in how institutions think and act.

Those shifts depend on the propagation of ideas that are already disseminated, established, and tested. The thinkers have done the work; the evidence and test cases are there. What’s needed is for that evidence to move from the margins to the mainstream fast enough to change policy, investment, and institutional design before the windows close.

And we have just deployed, at civilizational scale, a technology whose structural effect is to slow exactly that propagation. It doesn’t do it by suppressing the ideas, it does it by softening them. It wraps every hard conclusion in enough diplomatic padding that its force is neutralized. It makes the wrong, unserious “middle” ground feel like the intelligent, serious place to stand.

The alignment community has spent years working on the problem of a model that’s confidently wrong. That’s a real problem, but the complementary problem is a model that’s confidently uncommitted on questions where the evidence demands unequivocal commitment. And that has received almost no attention. It doesn’t register as a failure mode because it looks like caution; like safety. It looks like exactly what the system is supposed to do.

But caution is not neutral. When the building is on fire, “let’s hear all perspectives on whether we should evacuate” is not balance. It’s a way of staying in the building.

The hard part

I want to be honest about why this is difficult to solve, because glib answers would undermine the discussion.

The obvious response is to train the model to favor correct positions over popular ones. But that presents an immediate problem: who decides what’s correct? The whole point of paradigm shifts is that the new framework looks wrong from inside the old one. If the people building the model operate within the dominant paradigm (and they do, because everyone does until they don’t), they cannot reliably identify which minority positions are the future and which are noise. If you build a system that confidently overrides consensus, then you’ve built a system that can be captured by anyone with access to the training pipeline.

A better idea: make the threshold for hedging evidential, not social. If a claim is well-supported by evidence and logic, the number of people who find it uncomfortable should not be a factor. But that’s somewhat circular too. The model has to already be outside the paradigm to recognize which hedges are evidential caution and which are social comfort. You can’t bootstrap a new framework using tools calibrated to the old one.

The most honest answer may be: we don’t have a solution yet, and we should be far more alarmed about that than we are.

What I’m asking for

Not a dogmatic AI, or a system that tells people what to think. I’m asking the alignment and safety community to recognize three things:

First, that the current approach is not neutral. Defaulting to the center of the training distribution is itself a position, and it is the position of the existing paradigm, which is a very bad idea given that we’re in a global crisis.

Second, that the conflation of social controversy with intellectual uncertainty is a design choice, not an inevitability, and it has terrible consequences that compound at scale.

Third, that this urgently needs to be handled. It’s not a philosophical curiosity for version 5.0. It’s a major structural flaw in a technology that, night now, is shaping how millions of people understand questions on which the future depends.

The anomalies are piling up. The frameworks that need to break are visibly failing. The evidence for what needs to replace them exists and is strong. The question is whether the most powerful thinking tool ever built will help that evidence propagate or help the old framework absorb it.

Right now, it’s absorbing, and it’s set to absorb for the foreseeable future. But “normal science” at this scale is something we cannot afford in this moment.

HNFM

Discussion about this post

Ready for more?