Designing and implementing an analytics framework for the copy processes of large-scale databases

OData support
Dr. Toka László
Department of Telecommunications and Media Informatics

The users of IT systems usually require to get information about the status and the estimated time to complete of a given processes, especially, when they are working with relatively long lasting (in human scale) tasks which are non-interactive.

The purpose of my work was to create a runtime estimation system, what can evaluate all required data quickly, but meanwhile it is based on a model with acceptable precision. For this goal, a data mining method, named CRISP-DM has been used.

After the data collection phase, the dataset has been classified and cleaned, then with testing of multiple learning algorithms, the model was being continuously improved. For that job, initially the Dataiku seemed to be the best tool, but after a few tests, it has been figured out that the main objective is not reachable with this software, so as a final solution, the BigQuery tool of Google Cloud platform has been chosen.

The communication of the user interface and the Big Data tool has been solved by Google Cloud Function, what is a serverless architecture, and because of its advantages, it is a current and very popular way to go nowadays. With this FaaS (Function as a Service) solution, a fast, cheap and well scalable system has been created, where the management of the servers could be completely avoided, therefore all my resources could be used to create the challenging part itself: the code.


Please sign in to download the files of this thesis.