Stock movement prediction is a challenging problem, but thanks to the steady growing popularity of social media, besides the traditional techniques, new approaches appear to perform stock forecasting based on sentiment information hidden in social media content.
In this thesis, we investigate if social media content can be an indicator of future stock price movements or not. Specifically, we use sentiment extracted from Twitter messages to predict the direction of New York Stock Exchange's stock price movements.
We present two main groups of methods employing state of the art sentiment analysis techniques. The first group is a sentiment classification based method set, while the second is built on sentiment lexicons. We develop two versions of sentiment classification based methods, one uses simple words as features; the other attempts to capture microblog features, namely the special language characteristics of the Twitter microblogging environment. As a sentiment lexicon based method, we use five popular sentiment lexicons for determining moods of Twitter messages.
In order to improve the reliability of detected sentiment, we construct two techniques for filtering messages with more relevant sentiment information. The first attempts to define various user relevance measures and keep messages only from users considered as relevant; the second tries to filter out every non-English message.
During the evaluation phase, we make several interesting observations. First, the traditional word statistics based sentiment classification method clearly underperforms the two other method set. Second, in spite of support vector machines are supposed to be good at traditional text categorisation, neural network and Naive Bayes yield better result on sentiment classification of Twitter messages. Third, despite its simplicity, microblog feature based methods clearly compete against the widely used sentiment lexicon based methods. Lst but not least, in most cases, we obtain modest but consistent improvement over basic results by using relevant filtering techniques.
Because of existing disagreement in terms of determined mood among classifiers using various kind of methods, we conclude that different methods have different sentiment viewpoint of messages, capture different sentiment properties. To cover as full sentiment view as possible, we develop a combined method using the best configurations of the three main kind of methods. We find that the combination yields promising result.
As we compare our experiments to results published in research papers, we find that our developed methods might be nearly as good as traditional forecasting methods. Performing a simplified trading simulation, we come to the conclusion that social media, especially Twitter messages, might be a good indicator to predict the daily direction of future stock price movements.