In the age of public cloud computing, instead of buying and maintaining own assets, the model of renting only the required amounts of (hardware) resources, thus eliminating unnecessary expenses generated by unutilized capacities becomes more and more popular. Further benefit is the opportunity to hand over our infrastructure operation and management, which are loosely-related to our objectives, to the cloud service provider.
Cloud service providers widely offer solutions to run dynamically scalable, high availability, interactive applications in a highly managed (PaaS – Platform as a Service) environment, but similar features dedicated to the also common batch processing loads are just starting to spread.
The goal of my thesis was to develop a batch execution framework built on top of the above mentioned scalable Azure services, which is capable of handling batches consisting of arbitrary compositions of .Net function calls, based on a declarative description, with optimized data movements between the compute nodes, as far as possible. I have separated the optimization logic as a swappable module for further development, and focused on constructing the surrounding ruleset, which can ensure its correct operation.
In this document first I briefly introduce the utilized Azure services, along with two alternative solutions which are also available today as ready-for-use services, then based on the practical requirements I define the core concepts and features of the created system. Afterwards, I present the details of the architecture needed to realize those features, and explain some key points of the implementation. Finally, I give two simple examples to demonstrate the usage of the framework I have created.