With the continuous progression of digitalization, mobile technologies and the ``Internet of Thing'', the amount of collected data increased so much that the traditional database technologies cannot process these large data sets for most of the applications in an acceptable timely manner. The term Big Data collects those systems that can handle the processing of these fast changing, various, complex and huge amount of data.
Mostly Big Data components are installed in data centers utilizing all of their physical resources. However with the spread of the virtualisation, services installed in virtual machines or containers have appeared, which means applications can run in a virtual environment. Although in many cases, the components are still deployed on centralized infrastructure, e.g. in data centers. The continuous development of technologies providing a virtualized computing platform, e.g., OpenStack, allows taking the advantage of geographically distributed infrastructure for Big Data applications, e.g., processing data close to where it was generated.
In my thesis I give a brief explanation on the main components of Big Data technologies. I present the current Hadoop ecosystem and I give some details about the resource orchestration layer. I compare different resource orchestration algorithms. I identify three possible problems related to network resources of a geographically distributed topology, which I then solve in my thesis. I implement my solutions in existing resource orchestration algorithms and test their functionalities. Finally I present my results.