Gemba walks one floor. The agent field has no floor. That distinction is the whole argument.
In January 2025 — well before the agentic-AI wave crested — I posed a question in a public thread and waited to see whether anyone, or any AI, could answer it:
“Tell me how Agentic AI would have known to ask: what time of day do orders come in?”
No one answered it.
What came back instead was a description of what a well-designed agent would do — recognize the order-timing pattern, flag it as crucial, propose adjustments around the peak. But read it closely and the trick is visible: that description only became possible because I had already named order timing as the variable. The system did not surface the question. It explained, after the fact, why it could have — once the answer was in front of it.
That is the whole thing in miniature. Surfacing the right question and explaining a known answer in hindsight are not the same act. The first is diagnosis. The second is narration. And the current excitement about AI agents has them confused — which is why the architecture everyone is building is upside down.
Gemba walks one floor. The agent field has no floor — the misalignment lives in the seam between agents, not inside any one of them. An AI agent is simply the newest agent added to that field.
The category error
The prevailing framing treats the AI agent layer as a new world. A reasoning engine that sits inside the enterprise and decides what should happen next. A maturity curve that climbs from Fully Manual at the bottom to Autonomous Execution at the top, as though the destination lives at the summit.
It does not. An AI agent is not an ecosystem. It is an agent in one — the same as a supplier, a courier company, a customs authority, an approver, or a human buyer.
This is not a new definition I am introducing to suit the moment. I have modeled the enterprise as a field of internal and external agents since the Metaprise™ framework originated in 1999. Suppliers, couriers, customs, and the buyer were always agents. What 2026 adds is a new kind of agent — a reasoning layer — dropped into a field that has been there the entire time. The mistake is believing the new agent constitutes its own ecosystem. It is an extension of the one that already exists, and it inherits every misalignment that existed before it arrived.
What Gemba already knew
Long before any of this, Lean practice gave us Gemba (現場) — “the actual place.” The spot where the work is genuinely done: the floor, the dock, the desk where the order is keyed. The discipline of genchi genbutsu — go and see for yourself — rests on a single premise: the report of the work is not the work. Do not trust the dashboard. Go look.
I want to be precise here, because it would be easy to overstate the contrast for effect. Gemba is the closest established discipline to what I do in Phase 0™, and it is a genuine ancestor. A skilled practitioner walking the floor with disciplined root-cause analysis — asking what would have to be true upstream for this behavior to make sense — can sometimes reach the layer I care about. The lineage is real, and I would rather claim it honestly than pretend I invented the instinct.
But notice the shape of that reach. The moment Gemba arrives at the condition, it does so by importing a question Gemba itself does not supply: what would have to be true for this to make sense? That counterfactual is a diagnostic act. The method assumes a diagnostician will perform it. It does not perform it on its own.
Localized versus Metaprise™
So here is the relationship, stated plainly:
Gemba is the localized version of Phase 0™. Phase 0™ is the Metaprise™ version.
Gemba goes to the place — singular, internal, walkable. You can stand in it.
Phase 0™ goes to the field — the agents and the seams between them, most of which have no place you can stand in.
That is not a stylistic difference. It is a difference in topology, and it determines what each one can and cannot see.
Consider the Canadian DND engagement I keep returning to, because it remains the cleanest proof I have. Delivery performance moved from 51% to 97.3% inside three months and held for seven years, with 23% cost savings. The lever was order-timing alignment — when orders were committed relative to when fulfillment could actually occur. No new technology was introduced to achieve it.
Now try to find that misalignment with a Gemba walk. Stand on the floor. You will see correct orders, correctly processed, correctly fulfilled. Every observable behavior is valid. Walk the supplier’s operation: correct. Walk the courier’s: correct. The defect is not in anyone’s work. It lives in the seam between agents who do not share a place — the timing relationship across the field. There is no single floor you can stand on to see it, because the misalignment is not located in any one agent’s behavior. It is located between them.
A fair reader will press here: every field has seams — which one is load-bearing, and how would you know before you act? That is the right question, and it has a real answer rather than “diagnosis finds it.” A seam is load-bearing if changing it alone moves the outcome while everything else is held constant. That is a counterfactual test, not an observation, and it is the work the diagnostic act actually performs. The method I use to isolate it — Strand Commonality™, which I first developed in 1998 alongside the DND engagement itself with later funding from the Canadian Government’s SR&ED program — examines how the seemingly disparate strands across the agent field collectively determine a result, so that the one strand carrying the outcome can be separated from the many that merely look busy. The mechanics deserve their own treatment, and I will give them one in a follow-up piece. For this argument, the point is narrower: identifying the load-bearing seam is a distinct act with a distinct method, and no amount of observation substitutes for it.
It helps to name the three layers the conversation keeps collapsing into one. Maps represent belief. Process mining and Gemba observe behavior. Phase 0™ diagnoses the load-bearing condition across the agent field. Representation, observation, diagnosis — three different acts. Most of the tooling debate lives in the first two and assumes the third has already happened. It usually has not.
That is why the maturity curve is dangerous, not merely incomplete. If the load-bearing condition sits in the seam, then automating, optimizing, augmenting, or making autonomous does not fix it. Each step up the curve simply accelerates an incomplete understanding. You arrive at the wrong place faster.
Why the AI agent cannot rescue this
Here is where the order-timing question comes back.
An AI agent reasons across the data it is pointed at. Point it at S/4HANA, SuccessFactors, a procurement stack, a supply chain — and it will find patterns in what those systems recorded. What it will not do, on its own, is ask whether the seam between two agents is aligned, because that misalignment never threw an error. Every event in the log is valid. The system was doing exactly what it was told. There is no defect for the agent to anchor on, and no record of the condition that actually governs the outcome.
This is precisely what the 2025 thread demonstrated. No agent raised the order-timing question; the only account of it arrived after a human who had walked the field supplied the variable. Add a reasoning layer, an experience layer, and a control plane, and you have orchestrated the reasoning beautifully. You still have not answered the question underneath the architecture: what operating reality is that reasoning being grounded in, and has anyone diagnosed it before the orchestration begins?
An expert in the technology has never been enough. Hewlett-Packard was building an SAP practice to rival IBM’s and still failed to merge its own SAP environment cleanly, with losses estimated near $400 million. The expertise was not the missing variable. The sequence was. Maximum capability, applied to an undiagnosed condition, lost.
And this is no longer a story about individual companies. Widely cited MIT research — reporting that the large majority of enterprise GenAI pilots, on the order of 95% by its own figure, have produced no measurable return to the P&L — shows the same pattern at population scale. The reflexive reading is that the technology underdelivered, and the implied remedy is more of it, activated more fully, climbed to a higher rung. But that reading cannot distinguish between two very different organizations: the one whose pilot returned nothing because the model was weak, and the one whose pilot returned nothing because it was deployed onto a condition no one had diagnosed. The finding tells us the pilots did not pay off. It does not, on its own, tell us why. The maturity curve quietly assumes the answer is “not enough technology yet.” The substrate reading proposes the opposite — that the pilots which returned nothing were grounded in nothing.
The point
An AI agent is an extension of the existing ecosystem. It is not a new ecosystem, and it is not a substitute for diagnosing the one it joins. Drop it into an aligned field and it accelerates good outcomes. Drop it into a misaligned field — a field with seams no one has examined — and it accelerates the misalignment, with the added cost that it now sounds authoritative while doing so.
Gemba understood the principle in its localized form: go to the actual place, because the report is not the work. Phase 0™ extends that same instinct to the full agent field — human and AI, internal and external — and adds the act Gemba assumes someone will perform: identifying which seam, if changed, moves the outcome, before anything is built on top of it.
So let me state it as plainly as I can. AI agents are not a new ecosystem arriving to be orchestrated. They are a new type of agent — no different in principle from a supplier, a courier, or a human approver — added to an existing field of agents. The enterprise has always been this field. The only question that matters is whether the load-bearing alignment conditions across that field — human and non-human, internal and external — have been diagnosed and made sound before the new agent is introduced. If they have not, adding AI agents does not create an intelligent ecosystem. It creates faster, more autonomous versions of the existing misalignment.
The agents are new. The field is not. And no agent, however well it reasons, can ask the question the field has not yet been examined to raise.
Technology changes capability. Substrate determines survivability. The substrate isn’t more technology — it is how humans and AI agents align with the technology that already exists.
Truth is believing. Accuracy is knowing.
-30-
Related
An AI Agent Is an Extension of the Ecosystem — Not the Ecosystem
Posted on June 8, 2026
0
Gemba walks one floor. The agent field has no floor. That distinction is the whole argument.
In January 2025 — well before the agentic-AI wave crested — I posed a question in a public thread and waited to see whether anyone, or any AI, could answer it:
“Tell me how Agentic AI would have known to ask: what time of day do orders come in?”
No one answered it.
What came back instead was a description of what a well-designed agent would do — recognize the order-timing pattern, flag it as crucial, propose adjustments around the peak. But read it closely and the trick is visible: that description only became possible because I had already named order timing as the variable. The system did not surface the question. It explained, after the fact, why it could have — once the answer was in front of it.
That is the whole thing in miniature. Surfacing the right question and explaining a known answer in hindsight are not the same act. The first is diagnosis. The second is narration. And the current excitement about AI agents has them confused — which is why the architecture everyone is building is upside down.
Gemba walks one floor. The agent field has no floor — the misalignment lives in the seam between agents, not inside any one of them. An AI agent is simply the newest agent added to that field.
The category error
The prevailing framing treats the AI agent layer as a new world. A reasoning engine that sits inside the enterprise and decides what should happen next. A maturity curve that climbs from Fully Manual at the bottom to Autonomous Execution at the top, as though the destination lives at the summit.
It does not. An AI agent is not an ecosystem. It is an agent in one — the same as a supplier, a courier company, a customs authority, an approver, or a human buyer.
This is not a new definition I am introducing to suit the moment. I have modeled the enterprise as a field of internal and external agents since the Metaprise™ framework originated in 1999. Suppliers, couriers, customs, and the buyer were always agents. What 2026 adds is a new kind of agent — a reasoning layer — dropped into a field that has been there the entire time. The mistake is believing the new agent constitutes its own ecosystem. It is an extension of the one that already exists, and it inherits every misalignment that existed before it arrived.
What Gemba already knew
Long before any of this, Lean practice gave us Gemba (現場) — “the actual place.” The spot where the work is genuinely done: the floor, the dock, the desk where the order is keyed. The discipline of genchi genbutsu — go and see for yourself — rests on a single premise: the report of the work is not the work. Do not trust the dashboard. Go look.
I want to be precise here, because it would be easy to overstate the contrast for effect. Gemba is the closest established discipline to what I do in Phase 0™, and it is a genuine ancestor. A skilled practitioner walking the floor with disciplined root-cause analysis — asking what would have to be true upstream for this behavior to make sense — can sometimes reach the layer I care about. The lineage is real, and I would rather claim it honestly than pretend I invented the instinct.
But notice the shape of that reach. The moment Gemba arrives at the condition, it does so by importing a question Gemba itself does not supply: what would have to be true for this to make sense? That counterfactual is a diagnostic act. The method assumes a diagnostician will perform it. It does not perform it on its own.
Localized versus Metaprise™
So here is the relationship, stated plainly:
Gemba is the localized version of Phase 0™. Phase 0™ is the Metaprise™ version.
Gemba goes to the place — singular, internal, walkable. You can stand in it.
Phase 0™ goes to the field — the agents and the seams between them, most of which have no place you can stand in.
That is not a stylistic difference. It is a difference in topology, and it determines what each one can and cannot see.
Consider the Canadian DND engagement I keep returning to, because it remains the cleanest proof I have. Delivery performance moved from 51% to 97.3% inside three months and held for seven years, with 23% cost savings. The lever was order-timing alignment — when orders were committed relative to when fulfillment could actually occur. No new technology was introduced to achieve it.
Now try to find that misalignment with a Gemba walk. Stand on the floor. You will see correct orders, correctly processed, correctly fulfilled. Every observable behavior is valid. Walk the supplier’s operation: correct. Walk the courier’s: correct. The defect is not in anyone’s work. It lives in the seam between agents who do not share a place — the timing relationship across the field. There is no single floor you can stand on to see it, because the misalignment is not located in any one agent’s behavior. It is located between them.
A fair reader will press here: every field has seams — which one is load-bearing, and how would you know before you act? That is the right question, and it has a real answer rather than “diagnosis finds it.” A seam is load-bearing if changing it alone moves the outcome while everything else is held constant. That is a counterfactual test, not an observation, and it is the work the diagnostic act actually performs. The method I use to isolate it — Strand Commonality™, which I first developed in 1998 alongside the DND engagement itself with later funding from the Canadian Government’s SR&ED program — examines how the seemingly disparate strands across the agent field collectively determine a result, so that the one strand carrying the outcome can be separated from the many that merely look busy. The mechanics deserve their own treatment, and I will give them one in a follow-up piece. For this argument, the point is narrower: identifying the load-bearing seam is a distinct act with a distinct method, and no amount of observation substitutes for it.
It helps to name the three layers the conversation keeps collapsing into one. Maps represent belief. Process mining and Gemba observe behavior. Phase 0™ diagnoses the load-bearing condition across the agent field. Representation, observation, diagnosis — three different acts. Most of the tooling debate lives in the first two and assumes the third has already happened. It usually has not.
That is why the maturity curve is dangerous, not merely incomplete. If the load-bearing condition sits in the seam, then automating, optimizing, augmenting, or making autonomous does not fix it. Each step up the curve simply accelerates an incomplete understanding. You arrive at the wrong place faster.
Why the AI agent cannot rescue this
Here is where the order-timing question comes back.
An AI agent reasons across the data it is pointed at. Point it at S/4HANA, SuccessFactors, a procurement stack, a supply chain — and it will find patterns in what those systems recorded. What it will not do, on its own, is ask whether the seam between two agents is aligned, because that misalignment never threw an error. Every event in the log is valid. The system was doing exactly what it was told. There is no defect for the agent to anchor on, and no record of the condition that actually governs the outcome.
This is precisely what the 2025 thread demonstrated. No agent raised the order-timing question; the only account of it arrived after a human who had walked the field supplied the variable. Add a reasoning layer, an experience layer, and a control plane, and you have orchestrated the reasoning beautifully. You still have not answered the question underneath the architecture: what operating reality is that reasoning being grounded in, and has anyone diagnosed it before the orchestration begins?
An expert in the technology has never been enough. Hewlett-Packard was building an SAP practice to rival IBM’s and still failed to merge its own SAP environment cleanly, with losses estimated near $400 million. The expertise was not the missing variable. The sequence was. Maximum capability, applied to an undiagnosed condition, lost.
And this is no longer a story about individual companies. Widely cited MIT research — reporting that the large majority of enterprise GenAI pilots, on the order of 95% by its own figure, have produced no measurable return to the P&L — shows the same pattern at population scale. The reflexive reading is that the technology underdelivered, and the implied remedy is more of it, activated more fully, climbed to a higher rung. But that reading cannot distinguish between two very different organizations: the one whose pilot returned nothing because the model was weak, and the one whose pilot returned nothing because it was deployed onto a condition no one had diagnosed. The finding tells us the pilots did not pay off. It does not, on its own, tell us why. The maturity curve quietly assumes the answer is “not enough technology yet.” The substrate reading proposes the opposite — that the pilots which returned nothing were grounded in nothing.
The point
An AI agent is an extension of the existing ecosystem. It is not a new ecosystem, and it is not a substitute for diagnosing the one it joins. Drop it into an aligned field and it accelerates good outcomes. Drop it into a misaligned field — a field with seams no one has examined — and it accelerates the misalignment, with the added cost that it now sounds authoritative while doing so.
Gemba understood the principle in its localized form: go to the actual place, because the report is not the work. Phase 0™ extends that same instinct to the full agent field — human and AI, internal and external — and adds the act Gemba assumes someone will perform: identifying which seam, if changed, moves the outcome, before anything is built on top of it.
So let me state it as plainly as I can. AI agents are not a new ecosystem arriving to be orchestrated. They are a new type of agent — no different in principle from a supplier, a courier, or a human approver — added to an existing field of agents. The enterprise has always been this field. The only question that matters is whether the load-bearing alignment conditions across that field — human and non-human, internal and external — have been diagnosed and made sound before the new agent is introduced. If they have not, adding AI agents does not create an intelligent ecosystem. It creates faster, more autonomous versions of the existing misalignment.
The agents are new. The field is not. And no agent, however well it reasons, can ask the question the field has not yet been examined to raise.
Technology changes capability. Substrate determines survivability. The substrate isn’t more technology — it is how humans and AI agents align with the technology that already exists.
Truth is believing. Accuracy is knowing.
-30-
Share this:
Like this:
Related