This Bachelor’s thesis discusses my work on a Sentiment Analysis task using Su-
pervised Machine Learning. Automated text analysis such as this task became a
focused area in computational scientific research. The objective was to examine
different approaches to tackle the problem of algorithmically annotating semantic
information to text.
In chapter 1 an overview is given on the details of the task and why it is
important to be examined.
Chapter 2 gives a brief introduction to the Machine Learning and its fields that
are connected to the problem. The topics of evaluating, training are presented here.
It provides an insight into Neural Network’s properties and difficulties.
Chapter 3 is about the study of a learning system which induced the thesis. It
was proposed to semantic analysis task and was one of the best performers. It also
includes an analysis on why could this approach compete with the state of art results
including the architecture of it’s Neural Network and the use of word embeddings.
Chapter 4 is devoted to the experiments and their circumstances. Varying the
regularization and employed word embeddings were examined. Experiments have
shown that adding an extra word embedding to the model and transforming it to
the space previous one’s can increase the benchmarks on its on without optimized
Chapter 5 goes on to discuss future work such as to improve the handling of
words not present in the a certain word embedding.
The Sentiment analysis task was to label tweet with positive, neutral or negative label. This task was proposed in SemEval workshop throughout several years.
I examined the top systems of 2016 and chose the INESC team's system to research.
They approached this problem using Neural Networks and Word Embeddings.
Most of the study focuses on the exploitation of these Embeddings.
Several problems have to be solved regarding these representations including incompleteness and partially unmeaningful parameter space.
To address the first issue, a transformation between embeddings are used to incorporate the union of words existing in the embedding pair for representing words.
I experimented with transforming a structured skip gram generated embedding to another based on the Glove model.
Additionally, with a change of regularization procedures such as dropout, the performance measures on the test sets could be improved significantly.
The general conclusion was that adding an extra embedding as a fallback increased the model performance in recent years.