BigData infrastructure provisioning over orchestrated containers

OData support
Dr. Simon Csaba
Department of Telecommunications and Media Informatics

The topic of my Thesis was the performance analysis of virtualized infrastructures supporting BigData analytics over distributed systems. BigData analytics aims to process very large amount of data in order to gain valuable information and support business decisions. In my work I designed a test application and deployed it in high performance, generic Apache Spark technology based BigData system to evaluate its characteristics. I reviewed both the platform virtualization methods used in clouds and lightweight virtualization methods, focusing on OpenStack cloud systems and Docker container technologies detailing the orchestration alternatives. I selected the Sahara component of OpenStack cloud system and the native Kubernetes support for Spark, and deployed a test network, enabled to evaluate both technologies. I tested the performance of these technologies using both a Pi approximation and a sentiment analysis application. Finally, I compared the results and evaluated the implementation alternatives of Apache Spark.


Please sign in to download the files of this thesis.