When AI Errors Become Design Outputs: Why Pre-Incident Validation Is the Architecture That Matters Now

Posted on April 24, 2026

0


By Jon W. Hansen | Procurement Insights | April 24, 2026


A shift is underway in how courts classify AI errors, and it has structural consequences that most enterprise AI governance conversations have not yet absorbed.

Yas Milani named the shift directly in a LinkedIn post last week. Courts have stopped calling AI errors hallucinations. They are calling them design outputs. The Oregon Supreme Court, she noted, put it plainly: the AI is not perceiving nonexistent law due to some disorder. It is generating nonexistent law in accordance with its design. That is not a philosophical observation. It is a product liability sentence.

By the end of 2025, 729 AI hallucination incidents had been documented in US courts, and sanctions are escalating weekly. The “AI made an error” defence has stopped working in multiple jurisdictions. What replaces it is not better prompting or stronger indemnity language. It is audit trails, verification layers, and human-in-the-loop checkpoints that existed before the incident, not after.

Professor David Loseby of Leeds University Business School, Editor in Chief of the Journal of Public Procurement, tagged me into the conversation. My response focused on the architectural implication of Yas’s point — because the reclassification does more than shift legal strategy. It moves output accountability to the point of origin. Liability now begins at the design boundary, not the output boundary.


The Architectural Consequence

Audit trails, verification layers, and human-in-the-loop checkpoints are all necessary. But they operate after the system has already produced an answer.

The harder question — the one the “design output” reclassification forces into the open — is what exactly was designed, and on what assumptions. In a number of cases, the issue is not that the AI failed. It is that the AI performed exactly as intended against conditions that were never structurally tested.

That is the architectural problem.

If the legal system is now treating AI outputs as the product of deliberate design choices, then the enterprise has to be able to demonstrate two things. First, that the design assumptions behind the system were validated against real-world operating conditions before deployment. Second, that those assumptions continue to remain valid as conditions change after deployment.

Neither of those can be solved with an audit trail. Both have to be solved in the architecture.


Why Audit Trails Are Not Enough

Post-incident audit trails capture what the system did. They do not capture whether the system was designed to do the right thing in the first place.

Consider a practical example. An AI procurement agent is designed on the assumption that supplier reliability is a stable attribute — an assumption that was true when the agent was built but is no longer true in a supply chain reshaped by geopolitical disruption, tariff volatility, and compressed lead times. The agent operates exactly as designed. It produces consistent outputs. It passes every verification checkpoint because the checkpoints verify behavioral compliance, not assumption validity.

When the agent’s design assumption drifts from reality — when “reliable supplier” no longer means what it meant when the system was built — the agent does not fail visibly. It produces the same kind of outputs it always has, only now those outputs are structurally wrong. The audit trail will show the system operated according to its specifications. The legal liability will attach to the specifications themselves.

That is the exposure the design-output reclassification creates. And it is not a problem any post-incident control mechanism can solve.


The Architecture That Was Built for This

I wrote about a closely related dynamic in My Dinner With Claude in September 2025 — specifically, how an AI system can be highly reliable while still operating inside an unvalidated frame. Reliability without validation is a coordinated error waiting for a court to name it.

Augmented Reasoning Architecture™ (ARA™) was built to address this exact gap — the risk that systems will perform reliably against assumptions no one has structurally tested. ARA™ sits upstream of the model — before decisions are generated — and continuously tests whether the design assumptions remain aligned with real-world operating conditions. It is the foundation of the RAM 2025™ multimodel system’s perpetual loopback learning (continuously testing assumptions against new data and outcomes) — the mechanism first proven manually at Canada’s Department of National Defence in 1998, where a three-month engagement moved delivery performance from 51% to 97.3% and sustained that improvement for seven years before any automation was introduced.

The architecture was not built in response to the 2025 hallucination crisis. It was built 28 years ago, in response to a different manifestation of the same structural problem: systems performing exactly as designed against conditions their designers had never structurally tested. The vocabulary has changed. The physics has not.


The Three-Post Architectural Arc

This week’s Procurement Insights trilogy traces the same architectural argument through three different entry points. Each post addresses one face of the validation-versus-verification distinction the design-output reclassification now makes urgent.

When AI Gets the Right Answer to the Wrong Question — April 22, 2026. Why answer accuracy is not the same as question validity, and why an AI system producing the right answer to the wrong question is the exact failure mode the design-output reclassification now makes legally actionable.

The Machine Learned to Think. The Octopus Learned to Adapt. Neither Learned to Validate. — April 22, 2026. Why neither mechanistic intelligence nor adaptive intelligence is sufficient on its own, and why the validation layer has to exist architecturally before either scales into enterprise deployment.

Atlan Is Building the Context Layer. The Question It Cannot Answer Was Answered Manually in 1998. — April 23, 2026. Why context consistency (what Atlan and the broader category are building) is not context validity, and why the distinction now matters legally as well as architecturally.

Read together, the three posts describe one structural question from three vantage points: whether the AI system has been validated against the operating environment it is being deployed into. The design-output reclassification Yas named last week turns that question from a best practice into a procurement requirement.


What the Window Looks Like

The organizations that recognize the design-output shift as a structural question — not a regulatory one — will be the ones with pre-incident validation infrastructure in place when the procurement requirement lands.

The organizations that treat compliance infrastructure as overhead, as Yas put it, will be paying someone else’s legal fees.

That window is closing faster than most enterprise legal teams have been briefed.

CIPS members, procurement leaders, CDOs, and chief legal officers have roughly an eighteen-month horizon before pre-incident validation shifts from competitive advantage to contractual obligation. That shift is not speculative. It is the direct consequence of what the courts are now saying, in the exact language Yas surfaced.

This is not the end of the black box. It is the end of the black box as an excuse.

The architectural question — is your AI stack being coordinated, or continuously validated? — is no longer just an engineering question. It is now a liability question anchored at the point of origin. And the answer to it has to exist before the incident, not after.


Phase 0™ is the pre-commitment diagnostic that surfaces design-assumption gaps before deployment. ARA™-driven RAM 2025™ is the reasoning architecture that validates those assumptions continuously once a system is operating. Both are commercially available through Hansen Models™. Details at hansenprocurement.com.


Jon W. Hansen is founder of Hansen Models™ and the Procurement Insights archive — 3,300+ published documents, zero vendor sponsorships, in continuous operation since 2007. The foundational work began in 1998 with SR&ED-funded research at Canada’s Department of National Defence.

Hansen Models™ | Phase 0™ | Hansen Fit Score™ (HFS™) | RAM 2025™ | ARA™ (Augmented Reasoning Architecture™) | Learning Loopback Process™ | Hansen Strand Commonality™ | Implementation Physics™

hansenprocurement.com | payhip.com/hansenmodels | calendly.com/jon-toq/30min

-30-

Posted in: Commentary