Automatic attitude analysis in web environment

OData support
Supervisor:
Nagy István
Department of Telecommunications and Media Informatics

The amount of stored data at companies and other institutions accelerated such

technologies which were capable of solving business problems based on massive size of

data. Data mining is widely used to gain business advantage by exploiting the hidden

business values and correlations in the data. Text mining is a specific area of data

mining, which analyses textual data that is vastly different from structured data. Text

mining becomes more and more important due to the increasing popularity of social

sites.

In my thesis project I analysed the content of the social site, Facebook in order to

determine the attitude contained in the text. The domain of my study is political, I

detected the sentiment of users’ comments towards Hungarian politicians and political

parties using the publicly available content on the social site. I designed and developed

a data collecting software to be able to create a model. The crawler recursively

navigates through the data available on the site and stored it in an analysable way.

After the data collection phase I suggested and verified a method to sample the data. As

a result of the sampling, the new size of the dataset is considerably smaller. As the

quality of the dataset is diverse, I applied data cleansing. As a result, I was able to

provide a better dataset for tagging due to filtering.

Tagging is almost always carried out by human resources, usually by paid applicants.

This method does not scale well, frequently due to financial limitations. In order to

overcome the financial barrier, I invented a new crowdsourcing solution to annotate the

dataset by designing and developing a web application to enable a wider audience to

participate in tagging the dataset. By means of the annotated dataset, I was able to

provide the learning and testing dataset required by the model

Downloads

Please sign in to download the files of this thesis.