A practitioner built the perfect AI Agent, now what is the “BUT”?

Posted on April 29, 2026

0


Two questions Marijn didn’t ask, and what they reveal about the architecture above the agent.


A post from Marijn Overvest crossed my feed this morning. Title: Your Team Doesn’t Need Developers to Build AI Agents. Here’s Proof. The proof is a procurement professional who built a working AI agent using Power Automate, Copilot, and what Marijn calls “plain determination.” The agent reads invoices from a shared mailbox, cross-references them against the ERP, and flags the unprocessed ones. She has never written code. Her direct quote, which Marijn calls the money line:

“I used AI to help me write the code. I don’t understand all of it, but I understand what it does.”

The agent handles 80 percent of cases. She is “still working on edge cases.” Marijn celebrates this as the future of procurement, and the post closes with a link to his AI fundamentals training program.

I want to take the post seriously, because there is something real in it. The democratization of agent-building is real. The shift in who can produce useful automation is real. The procurement professional in the story did something most procurement professionals five years ago could not have done. None of that is in dispute.

But the story stops at the moment the agent works. It doesn’t ask the two questions that determine whether the local win produces sustained institutional capability or just relocates the dysfunction somewhere else in the enterprise.

The first question

What time of day do the orders come in?

That sounds like a small operational question. It is the most diagnostically loaded question in the entire workflow.

Invoices don’t arrive evenly. They cluster. They pile in at end-of-month from suppliers running their own batch processes. They burst in the morning after overnight EDI runs. Some suppliers — the Ariba-generated PDFs Marijn mentions are a good example — arrive in formats that take longer to parse and produce more error-flagging. Some arrive during local business hours; some don’t. Some have SLA clocks attached that started the moment the email hit the mailbox.

The arrival pattern determines queue depth at any given moment. Queue depth determines whether the agent’s processing speed is adequate or not. That, in turn, determines whether the human reviewer sitting downstream is looking at a steady trickle of flagged exceptions or a Monday-morning wall of them. It determines what happens during quarter-close when volume triples. It determines whether the 80 percent the agent handles is the easy 80 percent or the time-critical 80 percent.

None of this is visible in the demo. The demo shows the agent reading an invoice and checking the ERP. The demo does not show what happens at 4:47 PM on the last business day of the quarter when 340 invoices arrive in fourteen minutes from a supplier whose PDF format the agent has not previously encountered.

This is what Phase 0™ and Implementation Physics™ are built to surface. The architecture is not skeptical of the agent. The architecture is skeptical of the institutional readiness to operate the agent under the actual physics of the workflow rather than under demo conditions. Asking what time of day do the orders come in is the diagnostic instrument’s first move. The answer is almost never the answer the demo wants you to give.

The first question tests whether the workflow holds under load. The second tests whether the system around the workflow holds together once it does.

The second question

Whose workflow does her agent disrupt?

The agent’s success is being measured against a single human’s inbox: hers. The metric is hours saved. The narrative is local efficiency.

But the agent is touching a shared mailbox, parsing PDFs, querying the ERP, filing emails, and routing exceptions. Each of those touches sits inside someone else’s workflow.

The accounts payable analyst who used to scan the same mailbox and now sees only what the agent flags has lost ambient visibility into invoice flow. She no longer notices the supplier whose volume has quietly doubled, or the format change that used to register as a yellow flag and is now invisible because the agent classified it as routine.

The compliance reviewer who used to do quarterly invoice sampling now samples a population that has already been pre-filtered by an instrument whose internal logic the reviewer has never seen. The audit trail records what the agent did. It does not record what the agent decided not to flag.

The supplier on the other end — particularly the smaller supplier without ERP automation of their own — now interacts with an institution that responds faster on the easy cases and slower on the edge cases, because the easy cases run through the agent and the edge cases pile up behind a human queue that has been resourced down on the assumption that the agent is handling them.

The other automation in the building — the Power Automate flow somebody else built last year, the Copilot script another team is testing, the legacy RPA bot that has been quietly running for three years — may now be reading from the same mailbox or writing to the same ERP records. Two agents touching the same row at the same moment do not produce twice the efficiency. They produce race conditions, duplicate filings, or corrupted state that nobody catches until the audit.

This is what the relational governance discipline is built to address — a discipline whose primary current exemplar in the field is the Relationships First body of thinking developed by Strategic Relationships Solutions, though it is an altitude any firm with a comparable record of structuring institutional relationships could legitimately occupy. The local win is real. The relational web through which the local win travels — internal stakeholders, external suppliers, other automated systems, the human-AI handoffs that knit them all together — is where the local win either becomes institutional capability or becomes a new form of fragmentation. The discipline asks: who else is downstream, what are they expecting, and what does the agent’s success do to their capacity to do their work?

Why both questions, and why they sit above the technology

The Hansen Models™ architecture places two disciplines above the entire technical stack. The diagnostic discipline asks what is the structural reality of the workflow. The relational governance discipline asks who and what is downstream of every action the workflow produces. They are coequal because the workflow is happening at both altitudes simultaneously. A workflow that is diagnostically clean but relationally undefended produces local wins that destabilize the larger system. A workflow that is relationally well-managed but diagnostically thin produces sustained relationships around an instrument that doesn’t actually work.

Figure: The Camcorder — how institutions actually encounter the architecture. Three legitimate entry points (validation, marketplace, substrate) around the discipline core. The practitioner’s agent in Marijn’s story is a marketplace-altitude entry; its undefendable output is exactly what the validation-altitude entry surfaces. Both pull toward the same discipline core.

Neither discipline lives at the technology layer. Power Automate doesn’t ask what time of day the orders come in. Copilot doesn’t ask whose workflow downstream loses visibility. Claude doesn’t ask whether two agents are silently competing for the same ERP row. Those questions live at the discipline altitude, above the marketplace and the substrate, and they have to be asked by humans operating with diagnostic and relational intent before the agent gets built — not after it ships and starts producing 80 percent solutions to problems nobody mapped at full resolution.

What this means for the procurement professional

The procurement professional in Marijn’s story is doing exactly what her institution has set her up to do. She is building competently. She is iterating. She is using the tools available to her. The dysfunction is not at her desk. The dysfunction is at the discipline altitude — at the level where senior leadership decides whether the institution invests in tooling and confidence alone, or whether the institution also invests in the diagnostic and governance disciplines that determine whether the tooling produces sustained capability or another generation of audit findings.

The pattern is not new. Field technicians measured on 100 percent call-quota hit rates eventually hit 100 percent — by sandbagging, e.g, holding part orders until the end of the day versus ordering the parts after each call. The call response quota was met. However, downstream, the close-to-call ratio collapsed. AI agents are subject to the same dynamics, with one additional risk: the metric is optimized faster, by an instrument the operator does not fully understand, in a workflow whose downstream relationships are usually invisible from the operator’s vantage point.

Marijn’s framing — the barrier is not the technology, it’s the confidence to start — is partially correct and structurally insufficient. Confidence is necessary. It is not determinative. The Procurement Insights archive documents what happens at scale when confidence is treated as the determinative variable: the failure rate has been remarkably stable at 55-to-75 percent across seven procurement technology generations, and it has been stable precisely because the discipline-altitude conditions kept being treated as someone else’s problem.

The architecture sits above the democratization, not against it

The democratization Marijn is celebrating is real. The architecture sits above it, not against it. The question is not can we build agents. The question is what conditions determine whether the agents we build produce defendable institutional capability. Two questions get you most of the way there.

What time of day do the orders come in?

Whose workflow does her agent disrupt?

Ask both before the agent is built and the agent becomes the beginning of a capability. Ask neither, and the agent works while the system around it quietly breaks — and the institution discovers, late, that working and helping are not the same thing.

That is the difference the architecture is built to produce. And that is why “I don’t understand all of it, but I understand what it does” is not the money line of the future of procurement. It is the audit finding of 2029, written three years early.


Jon W. Hansen is the founder of Hansen Models™ and Procurement Insights. The Hansen Models™ architecture addresses the discipline-altitude conditions that determine whether AI deployments in procurement produce sustained institutional capability. Phase 0™ and ARA™-driven RAM 2025™ are the diagnostic and validation instruments at the heart of the architecture; the relational governance altitude is filled by partner firms whose published work demonstrates the discipline, with Strategic Relationships Solutions and its Relationships First body of thinking as a primary current example. Initial scoping conversations: calendly.com/jon-toq/30min or hpt@hansenprocurement.com.

-30-

Posted in: Commentary