Predictive Analytics on User Activity Data from a Location-Based Social Network

OData support
Gáspár Csaba
Department of Telecommunications and Media Informatics

Nowadays, Location-Based Social Networks (LBSNs) are a prominent part of Web 2.0. Through such a service, tens of millions of users share their actual locations with their friends. Thus, an LBSN is an exclusive free source of unprecedented sets of behavioral, spatial and social data.

This thesis aims to perform predictive analytics on a data set from one of the most popular LBSNs, Foursquare, in order to create predictive models that are able to forecast the number of so-called user “check-ins” at a business venue given by its geographical coordinates and category features. A model of this kind can enhance business decision-making by suggesting where to open a new venue to get as many check-ins from users as possible.

In our work, we reach this goal by following the standard phases of data mining. First, related scientific publications are studied. Then, we consider possible ways of gathering appropriate data, and examine the nature of the data set we have previously collected. By doing so, we gain insights into human activity patterns that are both interesting and useful for modeling. During data preparation, the set of venue attributes are expanded with the proposed so-called Environment Features. They are a collection of attributes that capture important pieces of information regarding the neighborhood of a given venue; and they help to improve the quality of the forecast. Predictive models are built using different learning algorithms, and then are evaluated according to a business-level performance measure that we recommend for these types of problems.

Our positive results point out that if we intend to predict the user activity of a venue from a LBSN then taking the properties of the nearby venues into consideration greatly increases the performance of estimation. With the support of our models, we can successfully predict in almost three quarters of all cases that which one of two venues will be more visited.


Please sign in to download the files of this thesis.