Recommendation system in a distributed environment for streaming and batch data

OData support
Dr. Kővári Bence András
Department of Automation and Applied Informatics

The goal of my project was to show how a recommendation system can work in a distributed environment using streaming data.

The first part of it was to understand the basics of our framework. I had to read the related literature in order to understand the underlying concepts, like stream processing, distributed computing and model-parallel machine learning. I got familiar with Apache Flink, since our framework was built on top of it. I also read the relevant literature of recommendation systems. It was rather difficult to understand state-of-the-art algorithms, while still learning about fundamental definitions and models of the topic.

The implementation part of the project was more familiar for me. I started with the online matrix factorization algorithm. Then I realized the double model load functionality, which enabled to load a part of the model into the worker nodes and the other part of it into the server. Next we implemented the top-K generation algorithm based on [1] and the online evaluation with DCG. At this point we had everything to realize the batch & online recommendation system and to evaluate the implemented algorithms. Based on our experiments the implementation works as expected. The quality of the models are similar to other open-source tools [2] and the performance is also promising. We conclude that our proposed system provides an alternative for a distributed, real-time recommendation system.

The project was presented at two different conferences [2,3] and it is part of a Horizon2020 EU project [4], which proved its viability.

[1] Olga Mykytiuk Christina Teflioudi, Rainer Gemulla. Lemp: Fast retrieval of large entries in a matrix product. In Proc. of the 2015 ACM SIGMOD International Conference on Management of Data

[2] Róbert Pálovics, Domokos Kelen, and András A. Benczúr. Tutorial on open source online learning recommenders. In Proceedings of the Eleventh ACM Conference on Recommender Systems , RecSys '17

[3] Gabor Hermann and Daniel Berecz. Parameter server on Flink, an approach for model-parallel machine learning, Flink Forward, Berlin, 2017.

[4] H2020 Streamline,


Please sign in to download the files of this thesis.