AI vendors want you to believe code is just text. On a mainframe, code is a physical execution graph spanning three decades of IBM middleware.
Every AI vendor sells the same modernisation playbook for mainframe. Chunk your legacy code. Vectorise it. Drop it into a standard RAG pipeline. Ask questions. Get answers.
This works for a Node microservice built in 2022. It is a spectacular way to misunderstand a z/OS system that has been running since 1988.
The problem is a fundamental assumption that does not hold on mainframe: that application logic lives inside the source files.
On a modern web application, the source code is largely self-contained. You can read a Python function and understand what it does, what it calls, and what it produces. The execution context is relatively shallow.
On a mainframe, a COBOL program is a fragment. Its meaning comes from everything around it:
A vector database reads a COBOL program and sees a dataset input. It has absolutely no idea where that dataset came from. Semantic search cannot resolve batch dependencies. An LLM cannot know that a JCL step executed at 2:00 AM sorted a VSAM file and passed it forward.
When you ask a generic AI tool what happens when an account fails validation, it searches for COBOL error logic. It completely misses the CICS transaction definition that intercepts the abend and routes the error to a DB2 logging table.
The correct unit of analysis on mainframe is not a COBOL program. It is the execution graph – the full picture of what runs, in what order, with what inputs and outputs, under what conditions.
That graph spans:
This graph is not written down anywhere in a single place. It is distributed across dozens of different system components, some of which have not been touched in fifteen years and some of which change every release cycle.
Building this graph from static source analysis alone is not possible. The source files contain the logic. They do not contain the execution context that gives that logic meaning.
What actually works is starting from runtime evidence rather than source code.
The mainframe generates an extraordinary amount of runtime data. SMF records capture every program execution, every dataset access, every transaction, every job step – with timing, resource consumption, and outcome. CICS journals record transaction flows. Job scheduler logs record dependencies and execution sequences. Abend records capture exactly what was running when something went wrong.
This runtime evidence tells you what actually happened, not what the code says should happen. It tells you which programs execute in production and how often. It tells you which code paths are live and which have not run in three years. It tells you which CICS transactions are business-critical and which are internal utilities.
Once you have that picture, AI analysis has something real to work with. You can ask meaningful questions: which programs are involved in this transaction flow, what typically precedes this abend, which datasets does this job depend on. The answers are grounded in actual execution history rather than static code inference.
The COBOL source is the last thing you look at, not the first.
If your AI tool cannot parse a CICS routing table, it has no business touching your core banking system.
More specifically: before evaluating any AI tool for mainframe diagnostics, modernisation, or analysis, ask these questions:
A tool that cannot answer yes to most of these is applying a generic AI pattern to a domain-specific problem. It will produce answers that sound credible and are wrong in ways you will only discover in production.
The approach that works starts from the execution graph, not the source.
Build the runtime picture first – from SMF records, CICS journals, scheduler logs, and dataset lineage. Identify which programs are live, which are critical, and how they connect. Then bring AI analysis to bear on that grounded picture.
This is more work than chunking source files and running a RAG pipeline. It requires mainframe-specific data ingestion that most AI vendors have not built. It requires understanding z/OS architecture at a level that most AI engineers do not have.
But it produces results that are actually trustworthy on systems where the cost of a confident wrong answer is a production outage at a bank.
Generic AI tools fail on mainframe because they are built for a world where code is text. On mainframe, code is a physical execution graph spanning thirty years of IBM middleware. The tools need to match the domain.
Also in this series: Why Mainframe is Different – The Execution Graph Problem · Runtime Evidence as the Right Starting Point · The Hidden Risk in Every COBOL Migration Project