AI does not replace mainframe expertise. It amplifies it. Here is where it genuinely adds value in daily diagnostic work.
Most of the conversation about AI on mainframe focuses on modernisation – using AI to migrate COBOL to Java, to document legacy code, to accelerate the transition away from z/OS.
This misses the most immediate and most practical use case: using AI to help experienced mainframe professionals do their existing work better, faster, and with less cognitive load.
Diagnostics is where AI adds the most immediate value on mainframe. Not replacing expertise – amplifying it.
When a production job or transaction abends on z/OS, the diagnostic workflow follows a consistent pattern:
Steps 1 and 2 are mechanical – experienced professionals do them quickly, but they are rule-based and well-documented. Steps 3 through 6 require judgment, experience, and context. AI accelerates steps 1 and 2 dramatically and provides useful starting points for steps 3 and 4. Steps 5 and 6 still require human expertise.
The most common abend in COBOL batch processing. A numeric field contains non-numeric data – typically spaces or low values – when a numeric operation is attempted.
What AI can do: given the dump data and the program source, identify which field caused the exception, which COBOL statement was executing, and which input dataset the bad data likely came from. Pattern analysis across multiple S0C7 abends can identify whether this is a recurring problem from a specific data source.
What still requires human judgment: whether the bad data is a one-off input error, a bug in the program that writes the field, a data conversion problem from an upstream system, or a genuine data quality issue that needs process remediation.
A program attempted to access storage it was not authorised to access. Often caused by pointer errors, subscript overflows, or incorrect use of address manipulation.
What AI can do: identify the instruction that caused the exception, the storage address that was referenced, and the likely cause based on the code at that point. Compare against historical S0C4 patterns to identify whether this is a new failure or a known intermittent.
What still requires human judgment: whether the root cause is a code bug requiring a fix, a WLM configuration issue, a memory constraint that needs addressing at the system level, or a race condition that requires architectural review.
A job step ran longer than its time limit. Could be a real performance problem, an infinite loop, or simply a time limit that is too tight for the current data volume.
What AI can do: compare this execution's CPU consumption against historical SMF TYPE 30 data for the same step. Identify whether CPU time has been trending upward, whether data volume has increased, whether there is a correlation with specific input characteristics.
What still requires human judgment: whether to extend the time limit, investigate a performance regression, or address the root cause in the code or the data.
A dataset ran out of space. One of the most mechanical abends to diagnose – the solution is almost always to increase the space allocation or to clean up old data.
What AI can do: identify the dataset, show its current allocation and historical usage trend, recommend a new allocation based on growth patterns. This is highly automatable.
What still requires human judgment: whether to increase the space, archive old data, investigate why data volume is growing, or restructure the dataset.
Performance tuning on z/OS is a domain where AI has significant potential because it is fundamentally a pattern recognition problem on large datasets.
RMF data analysis. RMF writes interval statistics on system performance. A busy mainframe generates gigabytes of RMF data per day. Finding the signal in this data – identifying when performance degraded, which resources were constrained, which workloads were affected – is tedious work that AI can accelerate. Example: CPU utilisation spikes at 14:00 every day, correlating with a specific CICS transaction type. Cross-reference with WLM service class assignments shows the transaction is in a service class with insufficient service goal. AI can surface this correlation in minutes from RMF and CICS data.
WLM service definition review. WLM (Workload Manager) controls how z/OS allocates resources across workloads. AI can compare WLM service class assignments against SMF and RMF data to identify workloads that are consistently missing their service goals, workloads consuming more resources than their classification suggests, and potential mis-classifications.
DB2 query optimisation. DB2 explains and access paths are structured data that AI can analyse. Comparing execution plans against historical performance data, identifying queries with degrading performance, suggesting index changes – these are tasks where AI pattern recognition adds value on top of the DB2 advisor tools that already exist.
For production mainframe systems, AI diagnostic suggestions always require human review before action. This is not a limitation to be overcome – it is the correct architecture for systems where the cost of an incorrect fix is a production outage at a bank or a government system.
Context that AI lacks. The experienced sysprog who reviews an AI suggestion knows things about the specific environment that no AI tool has access to – recent changes that might be related, known issues with specific configurations, architectural decisions made years ago that affect how things behave.
Risk assessment. The human decides whether the suggested fix is safe to apply in the current context – whether a change window is available, whether the fix needs testing, whether there are downstream implications.
Accountability. In regulated industries, someone has to be accountable for production changes. AI can recommend. A human decides and is responsible for the outcome.
A CICS region abends with an S0C7 at 14:23. The transaction is PAYTX, the payment processing transaction.
Sysprog opens the dump, identifies the PSW, finds the failing instruction, traces back through the DATA DIVISION to find the field, opens the CICS journal to find the input that triggered this transaction, and starts investigating what data caused the abend.
Time to starting point: 20–40 minutes for an experienced professional.
Abend record ingested with dump data and CICS transaction record. AI identifies: failing program (COBPAY03), failing field (WS-ACCT-BAL), input transaction data. Cross-correlates against 90 days of CICS history: S0C7 in COBPAY03 has occurred 3 times previously, always with specific account number patterns.
Time to same starting point: 2 minutes.
Sysprog recognises the account number pattern as accounts migrated from a legacy system last month. Root cause: legacy migration did not convert the balance field format correctly for accounts in a specific product category.
The AI did not solve the problem. It gave the experienced professional the right starting point to solve it quickly.
Also in this series: Runtime Evidence as the Right Starting Point · Why Generic AI Tools Fail on Mainframe · Why Mainframe is Different