Implementation

Why UAT Cannot Find the Failures That Cost the Most.

Younes Aatif

Founder & CEO, Flowsiti

10 min read

User Acceptance Testing is the last line of defense before an enterprise system goes live. It is also, by design, incapable of finding the class of failures that cause the most expensive post-deployment disasters.

This is not a criticism of UAT. It is a precise description of what UAT is — and what it is not. Understanding the distinction is the first step toward understanding why enterprise implementations fail at rates that have not improved in thirty years despite significant investment in testing methodology, tooling, and professional practice.

What UAT Actually Tests

User Acceptance Testing runs specific scenarios against a configured system and checks whether the outputs match expectations. A tester submits an expense report for $300 and verifies that the manager receives an approval request. A tester submits an expense report for $600 and verifies that both manager and VP receive approval requests. A tester submits a contract above a threshold and verifies that the correct approval chain is triggered.

If the outputs match expectations, the scenario passes. If enough scenarios pass, the system is approved for go-live.

This approach is effective at finding two categories of problems.

It finds execution failures — cases where the system crashes, produces an error, or behaves incorrectly in a way that is immediately observable. These are the failures that show up in system logs, produce error messages, or cause visible malfunction. UAT catches them reliably.

It finds methodological failures — cases where the wrong process was implemented. If the requirement was a monthly review cycle and the system was configured for quarterly, a UAT scenario that runs a monthly review will fail to produce the expected output. The tester sees the discrepancy and the configuration is corrected.

Both of these failure categories are testable because they are scenario-dependent. A specific input produces a specific output, and whether that output is correct can be determined by checking it against a defined expectation.

What UAT Cannot Test

There is a third category of failure that is not scenario-dependent. It is logic-dependent.

A logical contradiction in a workflow does not produce incorrect outputs on most inputs. It produces a structurally impossible state on a specific combination of conditions — conditions that may not appear in any UAT scenario because they were not anticipated during test design.

Consider a specific example. A company implements an expense approval workflow with three rules:

Expenses under $500 require manager approval. Expenses over $500 require manager and VP approval. Employees cannot approve their own expenses.

UAT tests: a $300 expense submitted by a junior employee — passes. A $600 expense submitted by a junior employee — passes. A $300 expense submitted by a manager — the manager's own expense is routed to the manager, who cannot approve it. UAT may or may not include this scenario. If it does, the tester may assume the VP should handle it and may not recognize this as a structural failure.

Now add two facts that exist in the organization but were not captured in requirements: the VP and the manager are traveling together on the same trip, and the expense is for a shared hotel bill. Under the documented rules, the manager cannot approve because it is their own expense, and the VP cannot approve because they are on the same trip and the organization has a policy that trip participants cannot approve shared expenses.

The approval is structurally deadlocked. There is no valid approver under the combination of conditions that actually exists. The system will not crash. It will not produce an error. It will wait — indefinitely, silently, correctly executing the logic it was given — until a human intervenes manually.

No UAT scenario will find this failure because the scenario requires knowing in advance that these two conditions will coincide. Testing is designed around expected scenarios. Structural failures exist in the unexpected intersections of conditions that were each individually reasonable and collectively impossible.

The Fundamental Limitation

The limitation of UAT is not a deficiency in testing practice. It is a mathematical property of what testing can and cannot establish.

Testing samples the behavior of a system. For any non-trivial system, the space of possible inputs is effectively infinite. Testing covers a finite subset of that space — the scenarios that testers thought to design. If a failure exists in a region of the input space that no test covers, testing will not find it.

Formal verification does not sample the behavior of the logic. It proves properties of the logic — properties that hold for all possible inputs, not just the ones that were tested.

The difference is the difference between checking whether a bridge holds under the loads that appear on a test schedule and proving that the bridge design is structurally sound under all loads within its specified parameters. The first approach might miss a failure that occurs under a load combination not included in the test schedule. The second approach cannot miss it, because the proof covers all possible load combinations.

For enterprise process logic, the relevant structural properties are specific and provable:

Does every process path that opens have a valid path to completion? A deadlock is a process path that opens and never completes. This is a property of the logic that exists independently of any particular input.

Does every step that reads data have a verified source that writes that data before the step executes? A data dependency failure is not a scenario — it is a structural property of the relationship between process steps.

Is every approval authority traceable to an organizational source that can satisfy it? An unreachable approval state is a structural property of the authority assignment logic, not a scenario that requires a specific trigger.

These properties either hold for all possible inputs or they do not. Testing cannot establish that they hold — it can only demonstrate that they held for the specific inputs tested. Formal verification proves whether they hold, unconditionally.

Why This Matters More Now

The gap between what UAT finds and what formal verification proves has always existed. What has changed is the scale at which undetected structural failures propagate.

When a human approval process contains a deadlock, a skilled employee routes around it. They make a phone call. They escalate informally. They exercise judgment about what the organization actually intends, even when the documented process does not accommodate the situation. The deadlock exists, but human flexibility compensates for it — invisibly, unmeasurably, at a cost that never appears on any budget line.

When an AI agent executes the same process, the agent routes around nothing. It executes the documented logic faithfully, completely, without judgment or flexibility. The deadlock that humans compensated for becomes a queue of stalled transactions that grows until someone notices, investigates, and traces the failure back to a structural property of the logic that has been there since the original implementation.

The agents being deployed across enterprise operations today are executing logic that was never formally verified. They are inheriting structural failures that human workers compensated for without noticing — and they are executing those failures at machine speed, at enterprise scale, without any of the informal judgment that previously kept the failures from becoming crises.

UAT approved the system. The structural failure was not in any test scenario. The agent does not know this distinction matters.

The Validation That Comes Before Testing

The correct position for formal verification is not after UAT — as a more rigorous form of testing — but before configuration begins. Before any platform is configured. Before any integration is built. Before any test scenario is designed.

Formal verification of the process logic establishes that the logic is structurally sound — that it is free of deadlocks, unreachable states, unsatisfiable data dependencies, and authority contradictions. This proof is independent of any particular platform configuration or test scenario. It applies to the logic itself, not to any specific implementation of it.

When this proof exists before configuration begins, UAT changes. It is no longer searching for structural failures that it cannot reliably find. It is verifying that the implementation correctly encodes logic that has already been proved to be structurally coherent. It finds execution errors and methodological mismatches — the failure classes it was designed to find — with a structural guarantee that the other failure class is not lurking in the logic, waiting for a scenario nobody thought to test.

UAT is not the problem. UAT is a good solution to the problems it is designed to solve. The problem is asking UAT to solve a problem it was never designed for — and not noticing for thirty years that it was consistently failing to solve it.

Every expensive post-deployment failure that UAT missed was not a testing failure. It was a structural failure that testing cannot find — a property of the logic that existed before the first test was run and would have been visible to formal verification before the first field was configured.

Flowsiti formally verifies process logic before configuration begins — proving the structural properties that UAT cannot test. The failure that UAT missed was already in the logic. flowsiti.com

Logic before code. Flowsiti formally validates business logic before deployment — finding what breaks before it breaks in production.

Request a Session