Social media has become part of our life. Facebook is a popular channel for businesses to reach their customers. Nevertheless, publicity works both ways; negative voices and false news might spread as wildfire across the internet. Reacting only to the most influential articles or posts can stand in the way of the most part of harmful rumours. There is a need for an automatic selection of posts that would attract intense reactions.
The goal of this thesis work was to estimate the number of comments a Facebook post will receive in the near future based on its history in the past and the features of the user creating it. The prediction algorithms require a certain amount of training data for building an efficient and precise model. However, it is not easy to create a large and representative database of the users’ activity. They are concerned about the privacy of their Facebook wall, and they are not willing to grant access to it.
Using only data collected on the Facebook domain results a model that is not precise enough. Transfer learning techniques need to be used to extract information from another dataset that comes from a slightly different, but related domain. Large amount of data had been collected, preprocessed and made publicly available of Hungarian blog sites. This data had been used to predict the number of comments a blog post is expected to receive. The dataset was used during the thesis research as supplementary data to train the models.
This thesis gives an overview on transfer learning techniques and terminology, and compares the performance of the following algorithms: naïve transfer (using only the common attributes of the different datasets), linear regression presuming independent attributes, multitask regression, kNN regression.