The demo worked perfectly. Eighteen months later, nothing is in production. Welcome to Pilot Purgatory.
The AI pilot worked perfectly.
The proof of concept ran on time, under budget, and impressed everyone in the room. The vendor got a standing ovation. The CIO signed the expansion proposal.
Eighteen months later, nothing is in production.
Welcome to Pilot Purgatory.
Pilot Purgatory is the graveyard between "this works in a demo" and "this works in production." It is where enterprise AI projects go to die slowly, surrounded by enthusiastic PowerPoint decks and increasingly awkward quarterly updates.
IBM CEO Arvind Krishna framed it at Think 2026[1]: enterprises need to "move beyond pilots and put AI to work across the business" – because too many remain "stuck between experimentation and production at scale."
That gap – investment vs payoff – is Pilot Purgatory measured in dollars.
The phenomenon is not unique to mainframe. But mainframe is where Pilot Purgatory is most dangerous, most common, and most expensive.
A mainframe AI pilot typically runs on:
This is not the mainframe. This is a theatrical production of the mainframe.
The real mainframe runs:
The pilot doesn't see any of this. The pilot sees the clean version.
Production is not the clean version.
1. The EBCDIC problem
AI models trained on ASCII text encounter EBCDIC data and produce confidently wrong answers. Packed decimal fields look like garbage to a model that has never seen COMP-3. The pilot used converted data. Production uses raw mainframe data. Nobody told the model about the difference.
2. The dependency problem
The pilot analyzed COBPAY03 in isolation. Production COBPAY03 calls SUBPAY01 which calls CICS transaction PAY1 which reads VSAM file PAYROLL.MASTER which is also updated by COBPAY07 running in a parallel region. The AI analysis of COBPAY03 alone is technically correct and operationally meaningless.
3. The runtime problem
The pilot used source code analysis. Source code tells you what could happen. SMF records, CICS journals, and job logs tell you what actually happens. In production, what actually happens is frequently different from what the source code suggests – because of fixes applied directly to load modules, because of JCL overrides that bypass the documented flow, because of conditions that only occur at quarter-end.
4. The institutional knowledge problem
The pilot had a subject matter expert available to answer questions. Production doesn’t. The person who knew why COBPAY03 has that specific COMP-3 definition retired in 2019. The documentation says "historical reasons." The AI has no way to distinguish "historical reasons that are safe to change" from "historical reasons that will break payroll for 847 employees if you touch them."
The pilot succeeds. Everyone agrees it worked. Then:
Each of these is individually reasonable. Collectively they are Pilot Purgatory.
The only way to escape Pilot Purgatory is to produce evidence from production, not demos from test environments.
This means:
Start with runtime data, not source code. SMF records don’t lie. They tell you exactly what ran, when, how often, and how long. An AI analysis grounded in SMF data is defensible to the change management committee because it reflects what the system actually does, not what the documentation says it does.
Test on production data, not samples. EBCDIC encoding issues, packed decimal fields, and VSAM quirks only appear when you use real production data. Find them in a controlled test before you find them in production at month-end.
Build the dependency graph before you start. Every COBOL program that touches a critical business process has dependencies that are not visible in the source. JCL procedures, CICS transactions, VSAM files, DB2 tables. Map them before you touch anything.
Define what "working" means in production terms. Not "the demo ran successfully." Not "the output matched the sample." Define it as: "PAYROLL1 ran to completion, SQLCODE=0, 847 employees paid, no abends, no alerts, no callbacks at 3 AM."
That is the only evidence that gets a mainframe AI project out of Purgatory.
IBM’s CEO said it plainly at Think 2026[2]: the enterprises pulling ahead are not deploying more AI – they are redesigning how their business operates. The ones still in Purgatory are the ones still running pilots.
The mainframe is where that gap is widest and most expensive. $10 trillion in daily transactions. Systems that have run without interruption for decades. Organizations that cannot afford to be wrong.
The pilot worked. The production system is the real test.
Build your evidence there, or stay in Purgatory.
References
[1] Arvind Krishna, CEO of IBM. Keynote address at IBM Think 2026. May 2026. On moving beyond AI pilots to production-scale deployment across the business.
[2] Arvind Krishna, CEO of IBM. IBM Think 2026 keynote remarks. On the critical difference between organizations redesigning business operations with AI versus those stuck in perpetual pilot cycles.
Also in this series: The Institutional Knowledge Problem · Runtime Evidence as the Right Starting Point · Why Generic AI Tools Fail on Mainframe · Why Mainframe is Different