Thirty Years of Evidence Says You Can’t Metric or Govern Your Way Out of a Readiness Problem

Posted on March 9, 2026

0


Published on Procurement Insights | Jon Hansen


For The Busy Executive

Gartner just published two frameworks — one for measuring AI’s impact on cost, revenue, and employee performance, and one for building the engineering architecture to take AI from pilot to production. Both are well-constructed. But they start one layer too late in the problem. The question isn’t whether you build an AI factory. It’s whether you run Phase 0 before you turn it on. If you don’t know why that distinction matters, read this before your next AI decision.


The Shared Assumption That Keeps Failing

Gartner recently published a set of AI value metrics designed to help CIOs move beyond activity tracking and tie AI investments directly to bottom-line outcomes — return on employee, revenue growth, cost reduction, with specific indicators like collection efficiency index, average labor cost per worker, and time to value.

Then, within days, Gartner published a second piece: “The Evolving AI Architecture — A Capability-Centric View.” This one addresses why AI models often stall at pilot. The answer, according to Gartner, is engineering architecture. Build a five-layer “AI factory” — facilities, infrastructure, scheduling and orchestration, AI-ready data, and AI value delivery — and you break through from pilot to production.

Together, these two pieces function as Gartner’s de facto AI framework for enterprises. One tells you how to build AI’s architecture. The other tells you how to measure AI’s outcomes.

Both frameworks are well-constructed. But they start one layer too late in the problem.


The Pattern in a New Costume

Recently I published a post on data governance that documented the same structural problem in a different era.

Every data governance initiative starts from the same assumption: that the data being cleaned, tracked, and managed is the right data to begin with. The discipline matured from database administration in the 1990s through MDM platforms, GDPR compliance, and now AI governance frameworks. Billions invested. Sophisticated tooling. Board-level attention.

The ProcureTech implementation success rate did not move. Across multiple independent sources — the Procurement Insights archive, published ERP failure research, and large-scale transformation studies from Gartner, McKinsey, and BCG — implementation success has remained stubbornly in the 20–30% band across every data governance era, through every platform generation, every regulatory wave, every tooling upgrade.

The graph below shows what that looks like plotted over 35 years.

Data governance sophistication rose steadily from 1990 to 2025. Procurement implementation success remained flat at 20–30% across every era.

The Gartner AI value metrics framework is the same architecture in a newer frame. Instead of asking “are we governing the right data,” it asks “are we measuring the right AI outcomes.” Both are legitimate questions. Neither is the first question.

The first question is: does the operating system — the decision rights, incentive structures, governance capacity, and data ownership — actually support any of this?


Two Cases That Asked the First Question

The Department of National Defence, late 1990s. An MRO procurement platform delivering 51% next-day performance against a 90% contract requirement. The instinct was immediate: automate, optimize, measure more rigorously. Before any of that, I asked one question — what time of day do orders come in?

The answer was 4pm.

The data was clean. The metrics were accurate. But they were measuring the output of an incentive misalignment no one had identified. Technicians were sandbagging orders until end of day to hit their service call targets. No governance framework surfaces that. No value metric catches it. The fix was incentive realignment — not better measurement of the existing system.

Delivery went from 51% to 97.3% in three months. It held for seven years.

One of the largest PC retailers in the United States, early 2000s. A vendor rationalization strategy that compressed hundreds of suppliers down to 100, with clean logic: better volume leverage, lower administrative overhead. Two years in, their data showed savings. Every metric was being tracked. Every governance process was working as designed.

They were paying 21% over market price.

Not because the data was dirty. Because the dataset they were governing had become a closed system. The 100 suppliers were being measured against each other — not against a market they had stopped looking at. Perfect governance. Perfect blindness.


What the Archive and Five AI Models Confirm

In March 2026, five independent AI models analyzed two graphs drawn from the Procurement Insights archive — 3,300+ documents, 180+ case studies, and eighteen years of longitudinal, unsponsored records.

The graphs were run three months apart. Every model returned the same structural conclusion: technology capability has advanced dramatically across every procurement technology generation. Implementation success has not moved.

Then each model cross-checked that finding against its own external knowledge base — Gartner’s own research on ERP failure rates, McKinsey’s 70% transformation failure figure, BCG’s finding that only 30% of large-scale tech programs meet their original expectations. Every external source confirmed the same pattern the archive had already documented.

The full analysis is here: procureinsights.com/2026/03/08/when-five-ai-models-analyze-the-same-data-three-months-apart-and-reach-the-same-conclusion/

Now add this post’s finding to that picture. The data governance maturity curve climbed steadily from 1990 to 2025. The AI value metrics movement is its direct successor — more sophisticated, better labelled, more precisely instrumented.

The success rate did not move during the governance era. There is no structural reason to expect it to move during the metrics era — unless the question that precedes both of them is finally asked.


The Question That Precedes Both

Gartner’s AI value metrics framework is not wrong. Gartner’s AI factory architecture is not wrong. Data governance is not wrong. Measuring outcomes is not wrong.

They are all being applied to what the existing architecture exposes — not to the deeper question of whether that architecture reflects the real agents and behaviors that actually drive outcomes.

Consider what the AI factory framing reveals. A factory is an input-output system. It assumes the inputs are correctly specified and the outputs will be used as intended. Neither assumption holds if the organizational operating system is misaligned. You can build all five layers of Gartner’s capability stack and deliver the output into an organization where technicians are still sandbagging orders at 4pm. The AI factory will process those orders faster. It will not change the behavior generating them.

In the DND case, the architecture exposed order volumes and delivery rates. It did not expose technician incentives. In the PC retailer case, the architecture exposed savings within a compressed supplier base. It did not expose the market it had stopped seeing.

In both cases, better governance and metrics would have more precisely “instrumented the shadows.” They would not have changed what was casting them.


The Numbers Tell the Real Story

The four-scenario graph below shows what happens when you map this across the decision that every organization with an AI investment is now facing (current state and Gartner-only figures grounded in published research; Phase 0 scenarios anchored to the DND empirical outcome and comparable archive cases).

The numbers tell the story directly. Current state: 20% full success — consistent with thirty years of PI archive data and Gartner’s own ERP failure research. Gartner AI factory plus value metrics alone: 32% — a genuine improvement, but 68% of the available success still left unrealized. Hansen Phase 0 alone: 78% — readiness-first delivers the majority of the available gain before a single dollar of AI infrastructure is committed. Phase 0 first, then the Gartner AI factory: 86% — the ceiling that neither approach reaches alone, and the number that represents what the AI factory investment is actually worth when the organization receiving it is ready.

This is the answer to the question Phase 0 raises. The AI factory is not made unnecessary by readiness diagnostics. It is made viable by them. Without Phase 0, the factory processes the wrong inputs efficiently. With Phase 0 first, the factory performs as Gartner promises it will.

The question isn’t whether you build an AI factory. It’s whether you run Phase 0 before you turn it on.

Thirty years of evidence says you cannot metric or govern your way out of a readiness problem. The Hansen Fit Score™ and Phase 0 diagnostic discipline exist to ask the question that data governance and AI value metrics frameworks have been skipping — before the governance program starts, before the factory is built, before the metrics framework is deployed, before the next initiative becomes another entry on the flat line.

This is not a procurement argument. It is an enterprise systems argument. The same diagnostic failure — optimising what the existing architecture exposes rather than first verifying that the architecture reflects reality — appears in every technology era, every governance discipline, and every metrics framework that has been applied to this problem. The audience for this question is not just CPOs. It is every CIO, CFO, and board-level governance function that is now making AI investment decisions without a readiness diagnostic in the sequence.


Current industry coverage is available at procureinsights.com. Hansen Models™ and the Hansen Fit Score™ framework: hansenprocurement.com

Jon Hansen — Procurement Insights | Hansen Models™ | Independent. Unsponsored. Archive-based. | procureinsights.com | hansenprocurement.com


-30-

Posted in: Commentary