Why AI Pilots Succeed and Productions Fail – The Mainframe Edition

The demo worked perfectly. Eighteen months later, nothing is in production. Welcome to Pilot Purgatory.

AI & Mainframe (Bonus) · Carmi Sternberg · May 2026 · 8 min read

The AI pilot worked perfectly.

The proof of concept ran on time, under budget, and impressed everyone in the room. The vendor got a standing ovation. The CIO signed the expansion proposal.

Eighteen months later, nothing is in production.

Welcome to Pilot Purgatory.

What Is Pilot Purgatory

Pilot Purgatory is the graveyard between "this works in a demo" and "this works in production." It is where enterprise AI projects go to die slowly, surrounded by enthusiastic PowerPoint decks and increasingly awkward quarterly updates.

IBM CEO Arvind Krishna framed it at Think 2026^[1]: enterprises need to "move beyond pilots and put AI to work across the business" – because too many remain "stuck between experimentation and production at scale."

That gap – investment vs payoff – is Pilot Purgatory measured in dollars.

The phenomenon is not unique to mainframe. But mainframe is where Pilot Purgatory is most dangerous, most common, and most expensive.

Why Mainframe Pilots Lie

A mainframe AI pilot typically runs on:

A test LPAR with 10% of production capacity
Sample data extracted months ago, cleaned for the demo
A subset of applications chosen because they are well-documented
A controlled scenario with known inputs and expected outputs

This is not the mainframe. This is a theatrical production of the mainframe.

The real mainframe runs:

Peak transaction loads that saturate CPU at month-end
EBCDIC-encoded data with packed decimal fields that look fine until you move them
COBOL programs that call JCL procedures that call CICS transactions that read VSAM files that are also updated by three other batch jobs running simultaneously
Institutional knowledge embedded in undocumented PIC clauses, hardcoded literals, and comment-free 900-line programs written by someone who retired in 2019

The pilot doesn't see any of this. The pilot sees the clean version.

Production is not the clean version.

This is not the mainframe. This is a theatrical production of the mainframe.

The Four Reasons Mainframe AI Pilots Fail in Production

1. The EBCDIC problem

AI models trained on ASCII text encounter EBCDIC data and produce confidently wrong answers. Packed decimal fields look like garbage to a model that has never seen COMP-3. The pilot used converted data. Production uses raw mainframe data. Nobody told the model about the difference.

2. The dependency problem

The pilot analyzed COBPAY03 in isolation. Production COBPAY03 calls SUBPAY01 which calls CICS transaction PAY1 which reads VSAM file PAYROLL.MASTER which is also updated by COBPAY07 running in a parallel region. The AI analysis of COBPAY03 alone is technically correct and operationally meaningless.

3. The runtime problem

The pilot used source code analysis. Source code tells you what could happen. SMF records, CICS journals, and job logs tell you what actually happens. In production, what actually happens is frequently different from what the source code suggests – because of fixes applied directly to load modules, because of JCL overrides that bypass the documented flow, because of conditions that only occur at quarter-end.

4. The institutional knowledge problem

The pilot had a subject matter expert available to answer questions. Production doesn’t. The person who knew why COBPAY03 has that specific COMP-3 definition retired in 2019. The documentation says "historical reasons." The AI has no way to distinguish "historical reasons that are safe to change" from "historical reasons that will break payroll for 847 employees if you touch them."

How Pilot Purgatory Happens

The pilot succeeds. Everyone agrees it worked. Then:

Month 1: "We need to get security approval for production access."
Month 2: "Legal needs to review the AI vendor contract."
Month 3: "We need to run it past the change management committee."
Month 4: "The z/OS team wants to understand the performance impact."
Month 5: "We should do a larger pilot first."
Month 6: "The vendor released a new version, let’s wait for that."

Each of these is individually reasonable. Collectively they are Pilot Purgatory.

The underlying reason is not bureaucracy. It is that nobody actually believes the pilot results apply to production. They saw a demo. They did not see evidence.

"A demo is not evidence. Evidence is what your production system produces at 3:17 AM when the payroll job abends."

How to Escape Pilot Purgatory

The only way to escape Pilot Purgatory is to produce evidence from production, not demos from test environments.

This means:

Start with runtime data, not source code. SMF records don’t lie. They tell you exactly what ran, when, how often, and how long. An AI analysis grounded in SMF data is defensible to the change management committee because it reflects what the system actually does, not what the documentation says it does.

Test on production data, not samples. EBCDIC encoding issues, packed decimal fields, and VSAM quirks only appear when you use real production data. Find them in a controlled test before you find them in production at month-end.

Build the dependency graph before you start. Every COBOL program that touches a critical business process has dependencies that are not visible in the source. JCL procedures, CICS transactions, VSAM files, DB2 tables. Map them before you touch anything.

Define what "working" means in production terms. Not "the demo ran successfully." Not "the output matched the sample." Define it as: "PAYROLL1 ran to completion, SQLCODE=0, 847 employees paid, no abends, no alerts, no callbacks at 3 AM."

That is the only evidence that gets a mainframe AI project out of Purgatory.

The Point

IBM’s CEO said it plainly at Think 2026^[2]: the enterprises pulling ahead are not deploying more AI – they are redesigning how their business operates. The ones still in Purgatory are the ones still running pilots.

The mainframe is where that gap is widest and most expensive. $10 trillion in daily transactions. Systems that have run without interruption for decades. Organizations that cannot afford to be wrong.

The pilot worked. The production system is the real test.

Build your evidence there, or stay in Purgatory.

References

[1] Arvind Krishna, CEO of IBM. Keynote address at IBM Think 2026. May 2026. On moving beyond AI pilots to production-scale deployment across the business.

[2] Arvind Krishna, CEO of IBM. IBM Think 2026 keynote remarks. On the critical difference between organizations redesigning business operations with AI versus those stuck in perpetual pilot cycles.

Also in this series: The Institutional Knowledge Problem · Runtime Evidence as the Right Starting Point · Why Generic AI Tools Fail on Mainframe · Why Mainframe is Different

Building mainframe AI that works in production? Start with runtime evidence – SMF data, CICS flows, job dependencies – not just source code or pilot results.

Learn more

Working on Linux and mainframe? IM3270 is a modern 3270 terminal emulator for Linux – free 60-day trial, no credit card required.

Download Free

Carmi Sternberg

FOUNDER, INFOMANTA LTD · MAINFRAME SINCE 1990

Blog