Skip to main content

Search...

Legacy modernization

Who modernizes legacy code when there are no experts? RAG-based AI draws knowledge directly from legacy code - and makes subject matter experts replaceable.

10 min read
Cover for Legacy modernization

Legacy modernization refers to the process of converting old software systems such as Cobol mainframes or early Java monoliths into modern architectures. The biggest bottleneck here is a lack of specialist knowledge about the legacy systems. Retrieval Augmented Generation (RAG) solves this: the legacy code is processed as a knowledge graph and made directly queryable using AI.

Key Takeaways

  • Lack of Subject Matter Experts, not lack of technology, was the main reason a real-world legacy modernization project was three quarters behind schedule.
  • Retrieval Augmented Generation solves the SME bottleneck problem because the legacy code is transferred to a knowledge graph and developers can ask specific questions to the code base via chat without having to retrain the model.
  • Microservices named after nouns (Customer Service, Product Service) create stronger coupling than verb-based services because business processes then typically touch all services at the same time.
  • Too many tokens in the prompt degrade the response quality because information in the middle of the context window is weighted less heavily by the model than content at the beginning and end.

When is software considered legacy?

Legacy software cannot be defined by a fixed age, but rather by its technology and how difficult it is to change. Mainframe systems in Cobol are clearly among them. But software that was created at the beginning of the millennium in early Java or .NET versions also falls into this category today.

Erik Doernenburg, who works at the consulting firm ThoughtWorks, sees a particularly large number of these systems in the insurance and banking industry. Much of what customers want to modernize has existed for a long time and should be changed, improved or transferred to the public cloud.

There are two reasons that prevent teams from approaching such systems. The first is a lack of testing. Nobody dares to make changes because nobody understands exactly how the system works anymore. The people who make adjustments today are rarely the ones who originally wrote the code. The knowledge has been passed from hand to hand over the years, often from service provider to service provider.

The second reason is a lack of understanding of the technical aspects. In the insurance sector in particular, takeovers and mergers result in systems where nobody knows what the policies that were sold 15 years ago looked like. To make matters worse, technical aspects and business logic are often mixed up in the code. If you want to understand the business logic, you constantly stumble across infrastructure code for the database or user interface.

Subject Matter Experts are the real bottleneck

In large modernization projects, the main problem is not the technology, but accessibility to people who can explain the legacy system. An analysis at a major German customer clearly showed this. The project was three quarters behind schedule, the costs were high, and the cause was the lack of Subject Matter Experts.

These experts explain to the teams writing new software how the old software works. This is exactly where the bottleneck arises. The problem cannot be solved by hiring or internal redistribution. These people are often no longer in the company or are so rare that you can’t get access to them in the time available.

Subject Matter Experts, people who can explain to the teams writing new software how the old software works: that was the bottleneck.

Erik Doernenburg

Why modern microservices are also becoming legacy again

The idea behind microservices was to make legacy in its current form superfluous. Small, deployable units should be able to be replaced with more modern technology after five or ten years without touching the entire system. Such firewalls allow parts to be recombined and discarded individually.

This works for many companies. Well-tailored services live out their niche existence on the fringes, run stably and do not block changes to code that is subject to greater change. Other companies end up with the “distributed monolith”, where the coupling between the services is so high that a change in one service affects three others. Sooner or later, such systems will become legacy again.

Erik mentions a simple rule of thumb that can be used to recognize this at an early stage. It concerns the names of the services.

NamingProbable consequence
Service has a noun (Customer Service, Product Service)Higher probability of strong coupling because business processes run via several services
Service is based on a verb (e.g. Order Capture)Higher probability for isolated services that map a complete business process

If you want to change a business process and have to reach across customer, product and other services to do so, the coupling is too high. Services that encapsulate a complete process can be replaced later without disrupting others. The original intention behind microservices was never entity-relationship modeling, but to break down business functionality into pieces that are as autonomous as possible.

Modernize in slices instead of in one big chunk

The most important strategy when modernizing is not to do it in one big step, but in slices. A mainframe modernization that runs for two or three years, delivers no results in between and switches everything at once on day X is a high-risk gamble that hardly anyone takes.

Instead, teams look for individual functionalities that can be viewed and transferred in isolation. This requires enough people in the company who know the technical functionality.

The longer a team works together in a similar constellation, the better its predictive power becomes. It keeps the slice size more constant and can estimate how much is left. If a slice lasts around three to four weeks and it is clear how many are still to come, it is possible to plan, even if one takes longer. This creates trust among stakeholders on the business side.

This approach avoids so-called melon reporting. Green on the outside, red on the inside: it is communicated to the top that everything is running smoothly, until three weeks before the end of a two-year program, half a year is suddenly missing.

Testing as a safety net for conversion

Legacy cannot be modernized safely without testing. The more test coverage a legacy system has, the better. In some cases, teams write tests first in order to have any security at all.

For modernizations, end-to-end testing is valuable to check whether the system still works after every change. Another strategy is to run the old and new systems in parallel for a while, three to six months. This shows whether the new system delivers the same output for all inputs from daily operations.

It is worth taking a look at the end of the month and the annual financial statements. Erik remembers a customer from the banking sector who had been tidied up and deleted. There was trouble at the end of the month because a process was sending messages that nobody knew where they came from, but which were important for the other side.

The same discipline applies to the new code. If you make rapid progress at the beginning and accumulate technical debt, you will lose speed later on. The teams rely on test-driven development, use unit tests to write the code and add end-to-end testing for the run-through from start to finish. Edge cases belong in the fine-grained tests, with integration testing in between for performance reasons. This is the classic test pyramid.

How LLMs help to understand legacy code

Large Language Models bring the greatest benefit not when writing new code, but when understanding old code. When it comes to forward engineering, i.e. generating code, the productivity gains are manageable. A Microsoft study cited a 55 percent increase in speed with Copilot, but this involved the pure act of programming and a web server in JavaScript, for which there are hundreds to thousands of examples on the Internet.

With specialized business logic, the picture changes. There is significantly less source code on the Internet for the business processes of an insurance company, and therefore less in the models. The productivity gain is correspondingly smaller. Nevertheless, hardly any developer who has used such a tool wants to give it up again, and measured against the personnel costs, the license costs are almost irrelevant.

The real leverage for legacy lies elsewhere. Erik sees it in a pattern called Retrieval Augmented Generation.

Retrieval Augmented Generation: Extracting knowledge from the legacy system

Retrieval Augmented Generation solves the hallucination problem in that the model does not have to generate the answer from itself, but should find it in the supplied text. Instead of asking “Where in the codebase does X happen?”, the prompt reads: Answer the question using the following text. And this text contains the documents from your own company.

Technically, this works via embeddings. From a text document, the model calculates a number vector that points documents with similar content in a similar direction. These vectors end up in a vector database. Before each request, the system calculates the vector of the user prompt, searches for the matching documents and adds them to the prompt. This is augmentation. Nobody has to train their own models, the company documents do not have to be in the model.

For legacy code, the teams have built an internal toolkit, not a finished product, but scripts, instructions and a chat interface. It uses a knowledge graph that incorporates source code, existing documentation and the output of classic reverse engineering tools such as dependency analyses.

The toolkit works in two modes:

  • Generate briefing documents In response to a prompt such as “Describe the capabilities of an admin user”, a few pages are generated explaining how one admin user differs from others in the code. Subject Matter Experts spot-checked these documents for at least one client and found no major errors.
  • **Dialog in the chat Just as a developer would ask a human expert, he asks the question to the tool. From the prompt, the system finds suitable nodes in the knowledge graph and runs along the edges, for example to functions that call other functions. In this way, it specifically collects the knowledge that should be included in the prompt.

Small, precise prompts beat the big context

More context is not automatically better. Providing as much unstructured information as possible is not the best approach. It is often more effective to pack less, but more skillfully selected information into the augmentation.

This is the “lost in the middle” problem. If you include a lot of tokens, information at the beginning and end is obviously more highly valued than that in the middle. For this reason, some people have started to compress the prompt with a smaller model.

There is a more elegant way for source code. Instead of compressing with a hammer, existing reverse engineering knowledge about dependencies can be used to enrich the prompt in a targeted manner. One particularly meaningful piece of information comes from the version history: which files are often checked in together. If three classes are always committed together, they belong together in terms of content, often more clearly than any static call graph analysis shows. Such signals significantly improve prompt augmentation because small, crisp prompts contain exactly the knowledge that the response needs.

The costs also speak in favor of this approach. Large input context windows, sometimes half a million tokens at Google, immediately cost money as soon as they are used frequently. Although this is put into perspective in terms of personnel costs, a precise prompt is usually the better choice.

Share this page

Related Posts