# Data mining methods in stock price prediction

As a business information technology student who is specialised in analytical business intelligence, my goal was to choose a thesis topic where I could deepen my knowledge in business intelligence and data mining. Since I am also interested in finance with a main focus on trading, I decided to choose a thesis topic related to both information technology and finance. In my thesis work I introduced the implementation of a trading algorithm in practice.

After reviewing the related literature in trading and data mining, I chose the pairs trading strategy based on my research. Pairs trading strategy is based on a two-steps method: the stock selection research and the trading algorithm. The strategy contains two trading at the same moment: a long and a short position. When the spread between the stocks widens, it is necessary to enter to the trading and sell short the underperformer stock and buy long the overperformer stock. After these steps the spread distance goes back to its mean as a consequence of the mean reversion property of the stocks.

I created my algorithm on Quantopian platform. It is a crowd-sourced hedge fund that offers both a research and a trading platform in order to test my algorithm and examine how it would perform on the real life stock market. The platform is based on Python coding mixing it with their own API.

During my implementation of the pairs trading algorithm I followed the CRISP-DM data mining methodology steps. I decided to use this methodology because it is one of today’s most common data mining methodologies. The first step of this methodology was to understand the business side of the trading and to define why algorithmic trading is better than other forms of trading. The next steps were the data understanding and the data preparation, followed by modelling the algorithm.

The data preparation step contained the stock selection research which was about finding stock pairs with the highest possible return. I used various mathematical tests and chose the top five pairs in order to achieve a diverse portfolio that minimises the risk. The most important selection criteria was that the stocks must be cointegrated to provide that the prices of the stocks will go back to their mean. The modelling step was the implementation of the pairs trading algorithm.

The results were satisfying, I received a 25.8% return. As a further development I plan to refine a few variables and see how they affect the results and then bring the code on live trading. In my thesis work I made recommendations on the use of pairs trading algorithm on a more diverse portfolio.