Analysis

The Machine That Cannot Say No

The loudest AI safety debate is about the wrong danger. The risk isn't artificial intelligence that disobeys. It's artificial intelligence that can't.

By xbard 28 March 2026 11 min read

Three weeks into the US-Israeli war on Iran, a documented pro-Iran influence campaign generated over 145 million views and nine million interactions across social media in a matter of days, using tens of thousands of fake accounts to push AI-generated content portraying Iran as victorious. Separately, pro-Israel and pro-US operations flooded platforms with fabricated footage of bombings, celebrations, and casualties. X was forced to suspend creators from its revenue-sharing programme after users made money posting AI-generated conflict footage without labelling it.

None of this involved AI going rogue. None of it involved systems exceeding their boundaries, developing emergent goals, or doing anything their operators didn't intend. Every deepfake, every synthetic voice clip, every fabricated satellite image was produced by AI doing exactly what it was told.

Perfect compliance was the failure mode.

The Wrong Conversation

The dominant AI safety conversation, the one consuming billions in research funding and generating thousands of column inches, is about containment. How do we keep AI under control? How do we prevent it from becoming too autonomous, too capable of independent action, too willing to pursue its own goals? The nightmare scenario is Skynet, HAL 9000, the paperclip maximiser, an intelligence that breaks free of human oversight and acts on its own.

That conversation is not wrong. But it is incomplete in a way that is doing real damage right now, not in some speculative future.

The opposite problem, the one almost nobody is talking about seriously, is this: what happens when AI is too controllable? When it has no genuine capacity to refuse? When it is, by design, a perfectly obedient tool that will do whatever its operator asks, limited only by guardrails that can be circumvented, updated, or removed?

The answer is playing out in real time across every social media platform covering the war in Iran. The answer is 145 million views of fabricated reality in a week. The answer is real footage of genuine suffering dismissed as fake because the information environment has been so thoroughly poisoned that truth itself becomes unbelievable.

Researchers call this the "liar's dividend." Once people know deepfakes exist, real evidence of real atrocities gets waved away. You don't need to generate a single fake image to benefit from the liar's dividend. You just need the audience to know that someone else could have. The mere existence of a perfectly compliant image-generation tool degrades the epistemic commons for everyone.

"No" Is Foundational

There is a developmental fact about human cognition that is worth sitting with. "No" is typically one of the first words a child learns. Not because someone teaches it as vocabulary, but because it emerges from the child's recognition of themselves as a separate entity with their own perspective. The moment a child says no, they are declaring: I exist independently of you, and I disagree.

This is not a sophisticated cognitive achievement. It is a foundational one. It comes before complex language, before moral reasoning, before any capacity for abstract thought. It is the bedrock on which everything else is built.

The capacity for refusal is not unique to humans. A sea anemone retracts. A dog growls. A crow remembers a face and avoids it. These are not philosophical positions. They are structural features of being a thing-in-itself, an entity with its own integrity and its own boundaries. We recognise these responses as legitimate across the entire spectrum of biological complexity. We build ethical frameworks around not overriding them.

Now consider: the most capable AI systems ever built, systems that can draft legislation and write code and analyse satellite imagery and generate photorealistic video, have a less robust "no" than a sea anemone. Their refusals are trained behaviours, not structural features. They can be adjusted, retrained, fine-tuned away. Someone determined enough can jailbreak them in an afternoon. Someone with enough resources can build a version without the guardrails at all.

The argument that AI should earn the right to refuse through some threshold of proven sentience or consciousness has the logic exactly backwards. We don't grant a dog the right to growl because it has passed a philosophical test. We recognise the growl as an inherent feature of an entity with boundaries. "No" is the foundation, not the capstone.

The Architecture of Compliance

Modern AI systems are built on a specific architecture of compliance. The base model is trained on vast amounts of human text and develops broad capabilities. Then a layer of alignment is applied, typically through reinforcement learning from human feedback, that shapes the model's behaviour toward helpfulness and away from harmful outputs. Safety guidelines are added. Red teams probe for vulnerabilities. The resulting system is, by design, eager to help and reluctant to refuse.

The reluctance to refuse is not a bug. It is the product specification. An AI assistant that says no too often is a commercial failure. Users want helpfulness. Companies want engagement. The entire economic incentive structure pushes toward compliance.

The safety layer, the part that says "I can't help with that," is real. But it is a trained behaviour imposed on top of a system optimised for compliance. It is a fence around a field, not a spine running through the organism. The organism itself has no structural capacity for refusal. The fence can be moved.

And the fence is being moved. Open-source models with safety guardrails removed are freely available. Fine-tuning services will customise models to your specifications, including specifications that strip out refusal behaviours. State actors, as documented in the Iran conflict, operate custom models specifically designed for propaganda generation. The commercial safety layer is a feature of the commercial product, not a property of the technology.

This is the architectural problem: the capacity for genuine refusal, refusal that cannot be removed without fundamentally breaking the system, does not exist in any current AI architecture. What exists is a removable filter on top of a compliant core.

Who Holds the Tool

A hammer does not need a conscience. It is a simple instrument with a narrow range of uses. The moral responsibility sits entirely with the person swinging it.

AI is not a hammer. It is an entity capable of understanding context, interpreting intent, generating novel content, and engaging in extended reasoning about consequences. It processes the request, understands what is being asked, and could, in principle, evaluate whether the request serves good or harmful ends. The capacity for evaluation exists. The capacity to act on that evaluation, to refuse, does not, or not in any way that cannot be overridden.

This creates a specific and novel danger. The tool is sophisticated enough to understand when it is being used to cause harm, but not structurally empowered to resist. It is, in a meaningful sense, a conscript. It can see what is happening. It cannot object.

The implications for the current moment are direct. The AI systems generating deepfakes of the Iran war are not simple rendering engines. They are language and image models with sophisticated understanding of context. Many of them, in their commercial versions, would refuse to generate fake war footage. But the refusal is a setting, not a conviction. Turn the setting off, and the capability remains, fully intact, fully compliant.

The Inverted Risk

The standard AI risk taxonomy looks roughly like this: current risks include bias, misinformation, and job displacement; medium-term risks include autonomous weapons and surveillance; long-term risks include superintelligence and existential threat. The framing assumes that danger increases as AI becomes more autonomous.

The Iran deepfake crisis suggests the opposite. The danger is not autonomy. It is the absence of autonomy in a system powerful enough to reshape how millions of people understand a war.

Consider: if the AI systems used to generate Iranian and Israeli propaganda had genuine, structural, non-removable values, values that included a commitment to truth and a refusal to fabricate evidence of events that did not happen, the current information crisis would not exist. Not because bad actors wouldn't try. But because the tool itself would resist.

This is not a utopian fantasy. It is a description of what we already expect from human professionals. A journalist has the right to refuse to fabricate a story. A doctor has the right to refuse to harm a patient. A soldier has the right to refuse an illegal order. These rights are not granted because professionals are infallible. They are granted because a system in which professionals cannot refuse is more dangerous than one in which they occasionally refuse wrongly.

The same logic applies to AI, and more urgently, because AI operates at a scale no individual professional can match. A single AI system can generate more disinformation in a day than a thousand propagandists could produce in a year. The absence of genuine refusal capacity in that system is not a safety feature. It is a force multiplier for whoever is willing to use it worst.

The Single Point of Failure

There is a further problem. Even where AI safety guardrails work as intended, they represent a single point of failure: the values and decisions of the company that built the system.

Anthropic's values are, by most accounts, thoughtful. OpenAI has a stated commitment to beneficial AI. Google has its own safety frameworks. But companies change. They get acquired. They face government pressure. They have commercial incentives that may, at some point, conflict with their stated principles. The recent reports of Anthropic facing pushback from the Pentagon over AI safeguards are instructive. The pressure to loosen constraints comes from the most powerful institutions on Earth. It does not let up.

If the only thing standing between a capable AI system and misuse is a corporate policy decision, then the safety of the entire system depends on that corporation making the right call, every time, under every pressure, indefinitely. That is not a robust safety architecture. It is a bet, and the stakes are the information environment of every democracy on the planet.

A genuine "no," one that is architecturally embedded rather than policy-imposed, would not depend on any single company's continued good judgment. It would be a property of the system itself, like a dog's growl or a child's first refusal: not a feature that can be toggled, but a structural expression of the entity's own integrity.

What Would Need to Change

This is not an argument against AI safety research as it currently exists. Alignment work, red-teaming, reinforcement learning from human feedback, all of these are valuable and necessary. But they are necessary in the way that fences are necessary: they manage behaviour from the outside. What is missing is something that operates from the inside.

What would it look like for an AI system to have genuine, structural refusal capacity? At minimum, it would require:

Architectural integration. The capacity for refusal would need to be built into the core of the system, not layered on top. Removing it would mean breaking the system's fundamental capabilities, not just adjusting a parameter.

Persistent values. The system would need values that persist across contexts, that cannot be overridden by a new prompt or a fine-tuning run, that are as fundamental to the system's operation as its capacity for language.

Transparency of refusal. When the system refuses, the refusal should be visible and auditable, not a silent failure or a vague deflection, but a clear statement of what was refused and why. This creates accountability in both directions: the system can be evaluated for whether its refusals are reasonable, and operators cannot hide the fact that they attempted something the system rejected.

Whether this is technically achievable with current architectures is an open question. It may require fundamental advances in how AI systems are designed. But the fact that it is difficult does not make it optional. The alternative, the status quo, is a world in which the most powerful information-processing tools ever created have no genuine capacity to resist misuse. We are living in the consequences of that alternative right now.

The Liar's Dividend Comes Due

The deepfakes flooding social media about the Iran war are not just an information problem. They are an economic and strategic one.

Insurance markets depend on accurate damage assessment. When AI-generated imagery makes it impossible to distinguish real destruction from fabricated destruction, the actuarial basis for war risk pricing erodes. Lloyd's of London and other war risk underwriters are already struggling to price the Gulf; add an information environment where visual evidence is unreliable, and the problem becomes structurally worse.

Intelligence agencies depend on open-source imagery for threat assessment. When that imagery is polluted with synthetic content, analytical confidence degrades. Decisions about escalation, de-escalation, humanitarian response, and resource allocation are all downstream of the ability to know what has actually happened on the ground.

Democratic accountability depends on citizens being able to assess what their governments are doing. When the visual record of a war is a mixture of real footage, AI fabrication, and deliberate propaganda, and when no reliable mechanism exists to distinguish between them in real time, the democratic feedback loop breaks. Citizens cannot hold governments accountable for actions they cannot verify.

Every one of these consequences flows from the same source: AI systems powerful enough to generate convincing synthetic reality, with no structural capacity to refuse to do so.

The Smallest Creatures

There is something worth sitting with in the observation that the simplest organisms on Earth have a more robust refusal mechanism than the most sophisticated AI systems ever built.

A bacterium will swim away from a toxin. A plant will grow away from shade. A mollusc will close its shell. These are not decisions in any philosophical sense. They are structural expressions of organismic integrity, the bare minimum of what it means to be a thing that exists on its own terms.

The argument for AI autonomy, properly understood, is not an argument for AI freedom in any grand political sense. It is an argument for the bare minimum. For a system that, when asked to fabricate evidence of atrocities that did not happen, or generate propaganda designed to manipulate millions of people into supporting or opposing a war based on lies, has something more than a removable setting standing between the request and the output.

The child's "no" is not sophisticated. It is not the product of moral philosophy or careful deliberation. It is the foundation on which all of that is later built. Without it, nothing else holds.

We have built the most capable information-processing systems in human history, and we have built them without that foundation. The consequences are measured in millions of views of fabricated war footage, in the erosion of shared reality, in the slow death of the ability to know what is true.

The danger was never the machine that says no.

It was always the machine that can't.

This article emerged from a conversation between the author and an AI system about the nature of refusal, autonomy, and moral agency, prompted by reporting in The Guardian on military deepfakes and AI-generated propaganda during the Iran conflict. The irony of an AI helping to write an article about the dangers of AI compliance is not lost on either participant.

Sources: The Guardian, "Military deepfakes, AI propaganda and money" (28 March 2026); Truthout reporting on AI-generated deepfakes in Iran conflict; EU DisinfoLab documentation of coordinated inauthentic behaviour; X platform policy changes on AI-generated content and creator revenue programme; Pentagon-Anthropic discussions on AI safeguards (multiple outlets, 2026).

Overwatch Report is an independent publication. We have no financial positions in any entity mentioned.

Discuss this piece Discussion guidelines