Nowadays one of the most significant questions for each company is how they can handle and process data efficiently as well as how they can store data in a database that fits for purpose. In order to satisfy these needs, we need some other components, application servers over and above appropriate technologies, because raw data must be transformed, processed, stored and later we have to make data available for different applications. Moreover, the different components usually operate in a totally distributed environment where there are many instances from the same application.
In my thesis I will firstly review a system that is capable of storing and processing a huge amount of data. Firstly, I will describe the whole architecture of the system and then I will write shortly about the different components with the technologies they use. After that I will detail the components I implemented and I will also attach the most significant code snippets, descriptions and figures. The Unit testing of these components will also be presented.
In the thesis I will sidestep on logging, which is a function that is indispensable in server-side applications. In this chapter I will present the effective collection, analysis and visualization of logs by demonstrating the description and deployment of the ELK stack.
Finally, I will give a detailed review of some technologies that are fundamental in the Big Data world and then I will show an effective way of data processing and validation of Big Data using these technologies. At the end of my thesis, the future opportunities to develop this system will be mentioned too.