Distributed recommender systems on Hadoop

OData support
Prekopcsák Zoltán
Department of Telecommunications and Media Informatics

The quantity of the data surrounding us is growing rapidly. It has assumed such a dimension for nowadays, that makes difficult the effective searching. The point of the recommender systems is to recommend personalized recommendation to a user. This makes easier to find something interesting in a big data set.

To make the recommender system scalable to reasonably large data sets, distributed solution is required. The Hadoop platform provides comprehensive support for this. The framework is able to handle algorithms witch are implemented using Map-Reduce paradigm. Furthermore, it provides a lot of help in connection with distributed running.

The Mahout library contains implementations of distributed machine learning algorithms including recommender systems. It relies on services provided by Hadoop. Nevertheless, these algorithms are in initial phase, therefore there are several further development opportunities.

During the work, I got familiar with Hadoop and Mahout systems. The target was to improve the precision of a distributed recommender system. Some preprocessing steps on the input data seemed good direction to achieve the purpose.

The Mahout does not contain distributed evaluator for recommender systems, therefore producing a framework was necessary to evaluate the results. The $k$-fold cross-validation is one of the most effective and most accurate evaluator nowadays.

The study of Koren and Bell\cite{netflix} was a great help to choose appropriate preprocessing steps. They used some

particular normalizing steps to get better recommendation result.


Please sign in to download the files of this thesis.