Why is Salesforce recommending AI users eat Skittles for breakfast? Because Skittles are easy to count — and outcomes are not.
The CIO article by Anirban Ghoshal — “AWU by Salesforce: A shiny new metric that tells CIOs little of value“ — should resonate not only in the CIO’s office but across the entire C-Suite. Once again, the industry is forcing AI into an equation-based framework of functionality versus an agent-based, outcome-oriented approach. Having tracked every technology era for the past four decades, the question that needs to be asked is this: Why is Salesforce making the same mistake in 2026 with AI that the industry has been making since 1983?
The Pattern
This week, Salesforce introduced the Agentic Work Unit — AWU — as a metric for measuring the value of its Agentforce AI agents. CEO Marc Benioff presented it during the company’s quarterly earnings call. The metric counts discrete actions performed by AI agents: updating a record, triggering a workflow, calling an external system.
Every analyst quoted in Ghoshal’s article identified the same structural problem. AWU measures execution, not outcomes. It counts whether an agent completed an action, not whether the action was correct, necessary, or produced a business result. One analyst described it as tracking “activity, not quality.” Another pointed out that without distinguishing between attempted, succeeded, and validated actions, AWU remains a throughput metric rather than a trust metric. Even Salesforce’s own CMO — who created AWU — conceded that the metric “quantifies work” while separate tools and guardrails are needed to ensure that work is “reliable, repeatable, and outcome-driven.”
This is not a new problem. It is the oldest problem in enterprise technology. And the fact that it is being repeated — by one of the largest enterprise software companies in the world, at the frontier of AI — tells us something important about why the documented failure rate for technology implementations has remained between 50% and 80% for decades.
The Same Mistake, Four Times
In 1983, Peter Kraljic published a paper in the Harvard Business Review that reshaped procurement. His insight was agent-based: different categories of spend behave differently, driven by complex, interacting forces — supply risk, market volatility, geopolitical disruption, supplier behavior. His diagnostic was correct. But the instrument the industry built on top of it was equation-based: a 2×2 matrix that segmented spend into four quadrants and spawned a universal sourcing process applied identically to all of them. Kraljic told the industry to differentiate. The industry standardized. Four decades later, the failure rate has not moved.
In 2026, ISM published a Digital Transformation Roadmap identifying five stages of digital supply chain advancement and five foundational pillars including process discipline, workforce enablement, and technology fit. The insight is sound — particularly the recognition that “automating a broken process simply accelerates inefficiency.” But the instrument is equation-based: a linear stage progression that puts “Define the Vision” before “Assess Your Current State.” The framework assumes the organization knows enough about itself to set a credible vision before it has been diagnosed. ISM recently noted publicly that their roadmap “supports many of the points” made in our analysis of readiness methodology — a significant acknowledgment that the foundational principles align, even where the sequencing differs.
In 2026, Gartner published its AI Sovereignty Stack — a framework for infrastructure sovereignty addressing where AI models sit, provider diversification, and data portability. The insight is real: organizations need resilience against provider disruption. But the instrument is equation-based: a compliance and architecture checklist that addresses infrastructure sovereignty without addressing judgment sovereignty. Five sovereign, diversified, compliant AI providers can all agree — and all be wrong. The framework measures resilience from disruption. It does not measure protection from bad decisions.
And now, in 2026, Salesforce introduces AWU — a metric for counting agentic AI throughput. The insight behind it is legitimate: enterprises need a way to measure what AI agents are doing. But the instrument is equation-based: a volume metric that counts completed actions without measuring whether those actions produced correct outcomes, whether the organization was ready to absorb them, or whether the decisions embedded in those actions were sound.
Four frameworks. Four decades. The same structural error: an agent-based problem forced into an equation-based instrument because equation-based instruments are what vendors can package, consultants can bill against, and analysts can track on a quarterly earnings call.
Why This Matters Beyond the CIO’s Office
The CIO article frames AWU as a technology measurement problem. It is larger than that.
The CFO is being asked to fund AI investments justified by AWU counts that do not connect to margin impact, cost reduction, or revenue protection. A dashboard showing 50,000 agentic work units processed tells the CFO nothing about whether those work units produced $1 of recoverable value. It is the AI equivalent of reporting “savings” from sourcing events that never reach the P&L — a pattern procurement leaders have lived with for decades.
The CPO is being told that AI agents are processing thousands of procurement actions — generating RFQs, updating supplier records, triggering contract workflows — while the implementation failure rate for the platforms those agents operate inside remains at 80%. AWU counts the volume of work the agent performed. It does not ask whether the organization’s processes, data quality, behavioral patterns, and decision structures can absorb what the agent produces. The work units accumulate. The outcomes do not.
The CEO is being presented with board-ready metrics that measure activity without accountability. As Constellation Research’s Liz Miller observed in the CIO article, AWU may serve as a health metric for financial markets tracking Salesforce’s utilization growth, but CIOs and C-Suite leaders are already skeptical about AI returns. A throughput metric does not resolve that skepticism. It deepens it — because it gives the appearance of measurement without the substance of accountability.
The DND Lesson — Again
Across two decades of documented procurement technology initiatives, we have seen this pattern produce the same consequence: organizations measure what the system does instead of whether the organization is ready to use what the system produces.
In 1998, Canada’s Department of National Defence operated an MRO procurement platform delivering 51% next-day against a 90% contractual requirement. The original request was to automate the existing system — to increase the throughput of orders through faster processing. In AWU terms, the ask was to generate more work units, faster.
The diagnostic question that changed the outcome had nothing to do with throughput. It was: what time do orders come in?
The answer — 4:00 PM — revealed an entire ecosystem of failure that no throughput metric would have surfaced. Service technicians were sandbagging orders until end of day to maximize service call targets. Late orders triggered customs delays with US-based suppliers. Dynamic pricing penalized late-day orders by hundreds of dollars per unit. The behavioral pattern — invisible to every participant inside it — was driving the delivery failure, the cost escalation, and the operational breakdown simultaneously.
If AWU had existed in 1998, the system would have reported increasing work units as automation accelerated order processing. The agents would have been completing more actions, faster. And every one of those actions would have been amplifying the failure — automating the sandbagging, the late orders, the customs delays, and the cost escalation with greater efficiency and better dashboards.
The solution that produced 97.3% next-day delivery and a 23% cost reduction sustained over seven consecutive years did not come from increasing throughput. It came from diagnosing the behavioral pattern the throughput metric could not see.
What Outcome-Oriented Measurement Requires
The analysts in Ghoshal’s article identified what AWU would need to become useful: distinction between attempted and completed actions, rollback tracking, per-tool success ratios, human intervention metrics, and validation logic confirming that business objectives were achieved rather than merely executed.
That list describes, in different language, what the Hansen Fit Score™ was designed to measure from the outset — not whether the system performed work, but whether the organization can absorb the outcomes that work produces.
The Hansen Fit Score™ evaluates three dimensions: Technical Capability (can the platform do what it claims), Behavioral Alignment (does the organization’s decision structure, change capacity, and operational culture support what the platform requires), and Readiness Compensator (what must change inside the organization before the technology conversation begins). RAM 2025™ validates those assessments across twelve independent AI models — not to confirm a conclusion, but to challenge it through structured dissent.
This is what outcome-oriented measurement looks like in practice. Not a count of how many actions were completed. A diagnosis of whether the organization is positioned to convert those actions into results.
The Deeper Question
Constellation Research’s Liz Miller compared AWU to the clicks and likes that became convenient stand-ins for success in early digital media — metrics that helped launch an industry but never proved durable indicators of real value. The comparison is apt, but it understates the risk.
Clicks and likes measured consumer attention. AWU measures enterprise decision-making. When the metric fails at the consumer level, an advertisement underperforms. When the metric fails at the enterprise level, a transformation initiative joins the 80%.
Moor Insights and Strategy’s Robert Kramer suggested that while the specific term may not endure, the concept of outcome-oriented AI metrics will likely appear in future RFPs. He is right — but with an important caveat: enterprises should define their own versions of a completed agentic task and how to verify it. If every vendor defines the metric differently, fragmentation follows. And if the metric remains equation-based — counting throughput rather than measuring outcomes — the fragmentation will produce the same false confidence that Kraljic’s matrix produced four decades ago.
The question for every C-Suite executive evaluating agentic AI investments is not how many work units the system can produce. It is whether your organization can convert those work units into outcomes — and whether you have measured that readiness before the first agent is deployed.
Salesforce built AWU to count the work. Phase 0™ was built to determine whether the work should begin.
The Hansen Fit Score™ consolidated vendor assessments — including evaluations of platforms deploying agentic AI capabilities — are available through Hansen Models™. The Agentic Governance Readiness (AGR) Index, a framework for assessing organizational preparedness for agentic AI deployment, is available at no cost on our Payhip store.
For enterprise purchases with corporate invoicing and PO terms: [Request Enterprise Purchase]
Jon Hansen Founder, Hansen Models™ | Creator, Hansen Method™
-30-
From Kraljic to AWU: Why the Industry Keeps Building the Wrong Instrument for the Right Problem
Posted on February 28, 2026
0
Why is Salesforce recommending AI users eat Skittles for breakfast? Because Skittles are easy to count — and outcomes are not.
The CIO article by Anirban Ghoshal — “AWU by Salesforce: A shiny new metric that tells CIOs little of value“ — should resonate not only in the CIO’s office but across the entire C-Suite. Once again, the industry is forcing AI into an equation-based framework of functionality versus an agent-based, outcome-oriented approach. Having tracked every technology era for the past four decades, the question that needs to be asked is this: Why is Salesforce making the same mistake in 2026 with AI that the industry has been making since 1983?
The Pattern
This week, Salesforce introduced the Agentic Work Unit — AWU — as a metric for measuring the value of its Agentforce AI agents. CEO Marc Benioff presented it during the company’s quarterly earnings call. The metric counts discrete actions performed by AI agents: updating a record, triggering a workflow, calling an external system.
Every analyst quoted in Ghoshal’s article identified the same structural problem. AWU measures execution, not outcomes. It counts whether an agent completed an action, not whether the action was correct, necessary, or produced a business result. One analyst described it as tracking “activity, not quality.” Another pointed out that without distinguishing between attempted, succeeded, and validated actions, AWU remains a throughput metric rather than a trust metric. Even Salesforce’s own CMO — who created AWU — conceded that the metric “quantifies work” while separate tools and guardrails are needed to ensure that work is “reliable, repeatable, and outcome-driven.”
This is not a new problem. It is the oldest problem in enterprise technology. And the fact that it is being repeated — by one of the largest enterprise software companies in the world, at the frontier of AI — tells us something important about why the documented failure rate for technology implementations has remained between 50% and 80% for decades.
The Same Mistake, Four Times
In 1983, Peter Kraljic published a paper in the Harvard Business Review that reshaped procurement. His insight was agent-based: different categories of spend behave differently, driven by complex, interacting forces — supply risk, market volatility, geopolitical disruption, supplier behavior. His diagnostic was correct. But the instrument the industry built on top of it was equation-based: a 2×2 matrix that segmented spend into four quadrants and spawned a universal sourcing process applied identically to all of them. Kraljic told the industry to differentiate. The industry standardized. Four decades later, the failure rate has not moved.
In 2026, ISM published a Digital Transformation Roadmap identifying five stages of digital supply chain advancement and five foundational pillars including process discipline, workforce enablement, and technology fit. The insight is sound — particularly the recognition that “automating a broken process simply accelerates inefficiency.” But the instrument is equation-based: a linear stage progression that puts “Define the Vision” before “Assess Your Current State.” The framework assumes the organization knows enough about itself to set a credible vision before it has been diagnosed. ISM recently noted publicly that their roadmap “supports many of the points” made in our analysis of readiness methodology — a significant acknowledgment that the foundational principles align, even where the sequencing differs.
In 2026, Gartner published its AI Sovereignty Stack — a framework for infrastructure sovereignty addressing where AI models sit, provider diversification, and data portability. The insight is real: organizations need resilience against provider disruption. But the instrument is equation-based: a compliance and architecture checklist that addresses infrastructure sovereignty without addressing judgment sovereignty. Five sovereign, diversified, compliant AI providers can all agree — and all be wrong. The framework measures resilience from disruption. It does not measure protection from bad decisions.
And now, in 2026, Salesforce introduces AWU — a metric for counting agentic AI throughput. The insight behind it is legitimate: enterprises need a way to measure what AI agents are doing. But the instrument is equation-based: a volume metric that counts completed actions without measuring whether those actions produced correct outcomes, whether the organization was ready to absorb them, or whether the decisions embedded in those actions were sound.
Four frameworks. Four decades. The same structural error: an agent-based problem forced into an equation-based instrument because equation-based instruments are what vendors can package, consultants can bill against, and analysts can track on a quarterly earnings call.
Why This Matters Beyond the CIO’s Office
The CIO article frames AWU as a technology measurement problem. It is larger than that.
The CFO is being asked to fund AI investments justified by AWU counts that do not connect to margin impact, cost reduction, or revenue protection. A dashboard showing 50,000 agentic work units processed tells the CFO nothing about whether those work units produced $1 of recoverable value. It is the AI equivalent of reporting “savings” from sourcing events that never reach the P&L — a pattern procurement leaders have lived with for decades.
The CPO is being told that AI agents are processing thousands of procurement actions — generating RFQs, updating supplier records, triggering contract workflows — while the implementation failure rate for the platforms those agents operate inside remains at 80%. AWU counts the volume of work the agent performed. It does not ask whether the organization’s processes, data quality, behavioral patterns, and decision structures can absorb what the agent produces. The work units accumulate. The outcomes do not.
The CEO is being presented with board-ready metrics that measure activity without accountability. As Constellation Research’s Liz Miller observed in the CIO article, AWU may serve as a health metric for financial markets tracking Salesforce’s utilization growth, but CIOs and C-Suite leaders are already skeptical about AI returns. A throughput metric does not resolve that skepticism. It deepens it — because it gives the appearance of measurement without the substance of accountability.
The DND Lesson — Again
Across two decades of documented procurement technology initiatives, we have seen this pattern produce the same consequence: organizations measure what the system does instead of whether the organization is ready to use what the system produces.
In 1998, Canada’s Department of National Defence operated an MRO procurement platform delivering 51% next-day against a 90% contractual requirement. The original request was to automate the existing system — to increase the throughput of orders through faster processing. In AWU terms, the ask was to generate more work units, faster.
The diagnostic question that changed the outcome had nothing to do with throughput. It was: what time do orders come in?
The answer — 4:00 PM — revealed an entire ecosystem of failure that no throughput metric would have surfaced. Service technicians were sandbagging orders until end of day to maximize service call targets. Late orders triggered customs delays with US-based suppliers. Dynamic pricing penalized late-day orders by hundreds of dollars per unit. The behavioral pattern — invisible to every participant inside it — was driving the delivery failure, the cost escalation, and the operational breakdown simultaneously.
If AWU had existed in 1998, the system would have reported increasing work units as automation accelerated order processing. The agents would have been completing more actions, faster. And every one of those actions would have been amplifying the failure — automating the sandbagging, the late orders, the customs delays, and the cost escalation with greater efficiency and better dashboards.
The solution that produced 97.3% next-day delivery and a 23% cost reduction sustained over seven consecutive years did not come from increasing throughput. It came from diagnosing the behavioral pattern the throughput metric could not see.
What Outcome-Oriented Measurement Requires
The analysts in Ghoshal’s article identified what AWU would need to become useful: distinction between attempted and completed actions, rollback tracking, per-tool success ratios, human intervention metrics, and validation logic confirming that business objectives were achieved rather than merely executed.
That list describes, in different language, what the Hansen Fit Score™ was designed to measure from the outset — not whether the system performed work, but whether the organization can absorb the outcomes that work produces.
The Hansen Fit Score™ evaluates three dimensions: Technical Capability (can the platform do what it claims), Behavioral Alignment (does the organization’s decision structure, change capacity, and operational culture support what the platform requires), and Readiness Compensator (what must change inside the organization before the technology conversation begins). RAM 2025™ validates those assessments across twelve independent AI models — not to confirm a conclusion, but to challenge it through structured dissent.
This is what outcome-oriented measurement looks like in practice. Not a count of how many actions were completed. A diagnosis of whether the organization is positioned to convert those actions into results.
The Deeper Question
Constellation Research’s Liz Miller compared AWU to the clicks and likes that became convenient stand-ins for success in early digital media — metrics that helped launch an industry but never proved durable indicators of real value. The comparison is apt, but it understates the risk.
Clicks and likes measured consumer attention. AWU measures enterprise decision-making. When the metric fails at the consumer level, an advertisement underperforms. When the metric fails at the enterprise level, a transformation initiative joins the 80%.
Moor Insights and Strategy’s Robert Kramer suggested that while the specific term may not endure, the concept of outcome-oriented AI metrics will likely appear in future RFPs. He is right — but with an important caveat: enterprises should define their own versions of a completed agentic task and how to verify it. If every vendor defines the metric differently, fragmentation follows. And if the metric remains equation-based — counting throughput rather than measuring outcomes — the fragmentation will produce the same false confidence that Kraljic’s matrix produced four decades ago.
The question for every C-Suite executive evaluating agentic AI investments is not how many work units the system can produce. It is whether your organization can convert those work units into outcomes — and whether you have measured that readiness before the first agent is deployed.
Salesforce built AWU to count the work. Phase 0™ was built to determine whether the work should begin.
The Hansen Fit Score™ consolidated vendor assessments — including evaluations of platforms deploying agentic AI capabilities — are available through Hansen Models™. The Agentic Governance Readiness (AGR) Index, a framework for assessing organizational preparedness for agentic AI deployment, is available at no cost on our Payhip store.
For enterprise purchases with corporate invoicing and PO terms: [Request Enterprise Purchase]
Jon Hansen Founder, Hansen Models™ | Creator, Hansen Method™
-30-
Share this:
Related