Sentiment Analysis Using Supervised Learning Methods

OData support
Supervisor:
Recski Gábor András
Department of Automation and Applied Informatics

This Bachelor’s thesis discusses my work on a Sentiment Analysis task using Su-

pervised Machine Learning. Automated text analysis such as this task became a

focused area in computational scientific research. The objective was to examine

different approaches to tackle the problem of algorithmically annotating semantic

information to text.

In chapter 1 an overview is given on the details of the task and why it is

important to be examined.

Chapter 2 gives a brief introduction to the Machine Learning and its fields that

are connected to the problem. The topics of evaluating, training are presented here.

It provides an insight into Neural Network’s properties and difficulties.

Chapter 3 is about the study of a learning system which induced the thesis. It

was proposed to semantic analysis task and was one of the best performers. It also

includes an analysis on why could this approach compete with the state of art results

including the architecture of it’s Neural Network and the use of word embeddings.

Chapter 4 is devoted to the experiments and their circumstances. Varying the

regularization and employed word embeddings were examined. Experiments have

shown that adding an extra word embedding to the model and transforming it to

the space previous one’s can increase the benchmarks on its on without optimized

hyperparameters.

Chapter 5 goes on to discuss future work such as to improve the handling of

words not present in the a certain word embedding.

The Sentiment analysis task was to label tweet with positive, neutral or negative label. This task was proposed in SemEval workshop throughout several years.

I examined the top systems of 2016 and chose the INESC team's system to research.

They approached this problem using Neural Networks and Word Embeddings.

Most of the study focuses on the exploitation of these Embeddings.

Several problems have to be solved regarding these representations including incompleteness and partially unmeaningful parameter space.

To address the first issue, a transformation between embeddings are used to incorporate the union of words existing in the embedding pair for representing words.

I experimented with transforming a structured skip gram generated embedding to another based on the Glove model.

Additionally, with a change of regularization procedures such as dropout, the performance measures on the test sets could be improved significantly.

The general conclusion was that adding an extra embedding as a fallback increased the model performance in recent years.

Downloads

Please sign in to download the files of this thesis.