The amount of stored data at companies and other institutions accelerated such
technologies which were capable of solving business problems based on massive size of
data. Data mining is widely used to gain business advantage by exploiting the hidden
business values and correlations in the data. Text mining is a specific area of data
mining, which analyses textual data that is vastly different from structured data. Text
mining becomes more and more important due to the increasing popularity of social
In my thesis project I analysed the content of the social site, Facebook in order to
determine the attitude contained in the text. The domain of my study is political, I
detected the sentiment of users’ comments towards Hungarian politicians and political
parties using the publicly available content on the social site. I designed and developed
a data collecting software to be able to create a model. The crawler recursively
navigates through the data available on the site and stored it in an analysable way.
After the data collection phase I suggested and verified a method to sample the data. As
a result of the sampling, the new size of the dataset is considerably smaller. As the
quality of the dataset is diverse, I applied data cleansing. As a result, I was able to
provide a better dataset for tagging due to filtering.
Tagging is almost always carried out by human resources, usually by paid applicants.
This method does not scale well, frequently due to financial limitations. In order to
overcome the financial barrier, I invented a new crowdsourcing solution to annotate the
dataset by designing and developing a web application to enable a wider audience to
participate in tagging the dataset. By means of the annotated dataset, I was able to
provide the learning and testing dataset required by the model