Financial markets can be viewed as complex dynamical systems which evolve continuous-ly and interact with the news, economic and political environment. As a side effect, they generate an enormous amount of data. On the one hand this gives the opportunity to im-prove the traditional stock movement prediction techniques by incorporating new type of information into the analysis but on the other hand also challenges the utilization of the latest Big Data technologies.
My thesis starts with a short introduction of the Big Data concept then continues with the review of the solutions offered by Oracle for processing Big Data. Oracle Advanced Ana-lytics is the platform I have used, which extends the Oracle Database with Oracle R thus giving the opportunity to run embedded R scripts on the database server.
After that I discovered the main steps of analyzing financial time series starting with the introduction of common time series forecast models (e.g. additive, multiplicative models), then moved on to the review of time series decomposition and different type of traditional statistics based forecasting models such as simple exponential smoothing, Holt-Winters models to name a few. I also reviewed novel results in making forecast for the direction of the stock market movement focusing on the usage of sentiment analysis of financial news.
In the second part of the thesis I designed an application that enables real time collecting, storing, analyzing and visualizing financial data using the R programming language. I ap-plied a dictionary based sentiment analysis on the financial news to discover whether the quantified market sentiment would be a good indicator to predict the direction of future stock price movements or not. The Harvard General Inquirer and the Bing Liu opinion lexicon were applied and compared. The relationship between market sentiments and dif-ferent market indicators were examined by looking at the correlation coefficients. During the evaluation phase I found that there is a notable correlation (0.34) between daily news sentiment scores and daily returns. I found that this correlation increases to 0.42 when using the weighted combination of the dictionaries.
I applied cross-correlation analysis and Granger causality test to determine whether market sentiment is likely to influence the change in stock returns by applying different lags be-tween the time series. I found that the impact of the news will be incorporated into the stock prices very soon having the highest correlation on the first day, and the direction is rather controversial: the returns are likely to lead to the market sentiment after a few days.