The following is an excerpt from a discussion that was held at DevOps Enterprise Summit London 2017 by Amine Boudali, Senior Project Manager, Nordea and Jose Quaresma, DevOps Lead DK, Accenture about the experience and lessons learned when setting up the Core Banking Platform in a containerized environment. You can watch the video of the presentation here.
How Do You Fit a Core Banking System into a Few Containers?
We want to share with you our journey through core banking, how we’ve implemented the platform, some of our challenges and how we’ve gone about easing the pain we were facing to tackle them.
But first, a brief introduction to Nordea.
- Nordea is the largest financial services groups in the Nordics.
- We have around 11 million customers separated into 700 towards retail or private customers and then 700 corporate.
- We came together as a collection of mergers and acquisitions, which actually led us to a relatively complex process landscape, but also an application landscape.
- In some of the Nordic countries, you can have anywhere from four, five, or more core banking systems that are all more-or-less doing the same thing, triggering an initiative towards simplification, which then became the largest transformation program in banking in Europe.
For this initiative, we had a focus on delivering incremental and frequent business value but also positioning Nordea to be the digital bank of the future.
But, How Do We Do That?
In order to achieve this, Nordea partnered with Temenos as a software provider as well as Jose Quaresma’s team at Accenture as a system integrator.
Since the beginning of the project almost two years ago, we maintained a big focus on the following guiding principles:
#1 — Automation
Automation was a very big goal for us. This was both on the environment provisioning, on the test automation, and also on our CICD pipeline.
#2 — Everything As Code
Then we also had this goal of having Everything As Code, and here it’s both for the environment configuration but also again for the delivery pipeline. You really want to be able to have everything in the repository to be able to replicate things and have better control on the changes being made.
#3 — Short Feedback Loops
This would enable the developers and testers to get quick feedback on whatever they were doing and with what they were testing, which is helpful for allowing them to learn fast to fail fast so that they can move forward with their development and testing.
B.C. — Before Containers
For our first goal, automation, while we weren’t in CICD, we were on our way there. We had continuous integration, we had automatic build and deployment to our development and test environment. But we really wanted to focus on pushing that further up the environment so we could get the full advantage of the work we were doing there.
From the Everything As Code perspective, we did not have any configuration management in place. We were using a custom made solution at Nordea, which meant that having Everything As Code was still something that we were striving towards.
Finally, from the perspective of our goal for Short Feedback Loops, we had daily builds and deployments, which was something that we’re very happy with considering the complexity of the system, but there were some challenges that prevented us from taking this further.
For example, there were some intricacies of deployment that actually required us to restart the WebLogic servers when we are deploying. So if we wanted to build and deploy — every time there was a change, we would run the risk of having an environment that is more often being restarted than actually up! That’s not helpful with the short feedback loop.
As we continued on our journey, we had some big challenges to overcome.
- At the beginning of our program, there was a lot of proof-of-concept that required several environments to adapt in very short time. If we needed to organize and orchestrate multiple units to provision that environment, it would take a lot of time. We’re talking months or weeks to provision one full-fledged environment. Couple that with the complexity of the deployment and orchestrating the downtime the product required — and you can see why we had a lot of challenges when it came to frequent deployments.
- With fragmented environment configurations, there was no full control over the whole stack, which meant that provisioning from this unit might lead to a different outcome depending on the time, the humor, or even the state of that person doing the work.
- Likewise, when you think about a core banking system, this isn’t a "system of engagement" type application; this is the core where everything ties together. You have upstream integrations coming from your channels. You also have downstream integrations into your accounting, accounting flow, data warehousing, etc. It’s really the heart of the body in that way. So it’s not just about how we stand that up, but also how we simulate and also mock these integrations.
- Finally, with shortening feedback loop to developers and testers, we wanted this to not only be about developing and testing in our current state but also the future.
Easing the Pain
So, where did we end up?
With long environment provisioning, we went from weeks and months to under one hour.
This was amazing for us. One hour.
Here we are not talking about a database provisioning, we are not talking about an application server provisioning. We are talking about a full-fledged environment, in under an hour by using this solution.
We’re pretty happy with that.
For the proof-of-concept challenges we had, we developed better life-cycle management. In the past, standing up an environment and decommissioning it required that you go back to these units to decommission it or reuse it, which meant that you would need to sequence your proof-of-concept.
With this system, we worked to rev up the environment, used it for the proof-of-concept so that there were no dependencies, and we can decommission it when we want to.
Now onto the challenges we had with complex deployment and orchestrations.
Here, for the first time, we were able to use this product to do live deployments. Which meant that this came down from one hour of downtime to zero downtime. Of course, this is not production-ready. We still have small things to iron out from a business perspective, but we are able to do live deployments.
With our fragmented environment configurations, this is a full infrastructure as code. We talk to the developers and testers and the teams that are developing on this platform, they help us to improve the environment so they can actually put in requests or merge request and then we review that and take them in. This is more to bring them into the world of how we provision the environments.
We shortened feedback loops. With the ability to provide that end-to-end integration to the core banking system, we were also able to do frequent deployments to the development environment but also concepts such as time travel. This concept gives us the possibility to do the end of year reporting, the end of month reporting, or for example interest accumulation — ahead of time. We don’t need to wait for that time to do it. We are able to basically fast-track the time from today and do those type of testing in order to ensure the quality of the product.
Now You May Be Thinking, “Okay, This Looks Awesome, but How Did You Do It?”
Our answer, and we’re not by any means saying that it’s the answer, was to use Red Hat OpenShift platform to start this transformation.
We had two teams of about 5 people, with a few people were focused on installing the platform, and then we had the others focused on setting up of the application into the new platform.
Here the migration was very much a lift and shift migration. We didn’t want to be thinking about which technology stack we should really be using in this platform, but more “let’s grab the technology stack that we have right now running, and see what it gives us.”
This took us around half a year, and what we ended up with was a setup where we have a core banking application project the OpenShift sense, that mostly consists of three containers:
- An Oracle database container and that one is attached to persistent volume to persist the data
- A WebLogic container where the Temenos core banking application is running on
- An Oracle ServiceBus container that is interacting with the T24 system (in this case with the WebLogic container to trigger the integrations and the services being used there)
Currently, we have around 30 of these projects running in our development and test environments and being used by both teams to test features that they want to play around with, they can use it for the end of year reporting or testing using Time Travel Features, etc.
Now let’s take a look at what this looks like, here is a picture to illustrate how the live deployment with OpenShift works:
Here we are combining the built-in features from OpenShift with the application. Here we have a deployment in progress and the one you see on the left is the container that is currently running. That’s where the traffic is being routed through. But then you have a container on the right that is being deployed with the updates, and that one is just starting.
What OpenShift is doing here for you is that while the container on the right is starting, the traffic will go through the one on the left, the old one, but then once OpenShift sees it — a new one is ready and deployed, which shifts the traffic to the new container and kills the old one.
But While It’s Our Answer, but It’s Not the Final Answer.
We do have new challenges that we are now starting to focus on.
First, there was a certain lack of awareness within the different teams of the platform. The teams are very busy working on the different features and the things that they have to release. Then we came in and we have this platform with all these new features. So we’ve shifted to thinking more about how we communicate with the teams and inform them about this new platform and the things that they can do with it.
Another very important paradigm shift happening is around "treating our servers as pets to treat them as cattle," to steal the famous analogy. If you’re doing manual changes in the servers, they might be gone in half an hour, or whenever you do the next deployment. It’s key to have that mindset shift.
Next, we are pretty aware that we do have some heavy containers running on the platform. While the system is not heavier than it was before when it was running in a more standard VMware kind of environment, it is still very heavy.
Which means that it doesn’t allow us to take full advantage of a containerized environment where we could have a very quick way of loading environments. Instead, we have containers that are slow to load, that are quite heavy, and fairly big.
ATTEND THE DEVOPS ENTERPRISE SUMMIT
London: June 25-26, 2018
Looking forward what we will be thinking about how we manage our users, customers, but also people, and figuring out when we’ll say that this feature is ready for you to use.
Currently, the platform is in a Dev and Test environment, but we’re hoping to bring it to a production-ready state sometime this year.