The financial sector uses huge amounts of data every day. The effectiveness of large, compute-intensive operations performed on them is very important. The speed, availability, or any other factors can be measured in terms of money.
Morgan Stanley provided the subject of my thesis. It's about a project I've participated in. During the project we replaced an old system which did batched risk calculations, with a web service which is capable of performing on-demand operations, and it is able to serve a GUI.
The replacement of an old system may be needed for several reasons. The requirements are constantly evolving, performance issues can emerge, and in addition to those, for banks the changing legal environment and the different regulations are also important. The user experience of the internal software plays an ever greater role, even though it is delayed a bit relative to external software. The new system is built because the amount of data to process had grown steadily over time, which was proven to be difficult to handle with the old system, but the most important thing is, that it was too inflexible. In order to perform small changes, really long processes had to run again.
Several new technologies came into play, but the choice finally fell on the Greenplum distributed database, which simultaneously solved the storage and computing capacity problems. With big computing capacity, the data can be analyzed quickly, and also high flexibility can be achieved. Flexibility is essential in risk analytics, because of the continuously increasing regulatory scrutiny, and also to determine the balance between low risk and maximum profit. Greenplum scales really well, so it provides a solution to future problems: the growing data quantity, and the increasing number of analytics which has to be run on the data. In an enterprise environment, we have to think about integration too, it is important that the new system works with existing infrastructure, and any other old system. Greenplum was a good candidate, and this was achieved over time.
The aim of my thesis is to describe the Greenplum database, investigate the business background and the requirements, and most importantly is to describe the individual components of the system, particularly the parts that I wrote.
The design phase is mostly about the database schema and the top-level architecture of the system, after that I compare the different implementations of risk analytics calculations, than I describe the chosen implementation and the tests from a developer’s perspective, and finally I also explore further improvement opportunities.