Nowadays, the growth of data traffic may bring many benefits, but its processing also leads to many challenges. The collection and efficient use of data streams generated by a large number of human communities and machine sensors is a serious task for professionals.
The main objective of my thesis is to develop a web application that is capable of receiving, processing, analyzing and visualizing data streams. The sources of the data streams are a financial data provider portal (Alpha Vantage), and one of the biggest social networks (Twitter). Using data mining methods, I seek connections between their data. I intend to build a system that has flexible configurable parameters so that custom analyses may be created with its help. After determining the term-weight list (containing the coefficient of each term) from the processed stock data and historical tweets, the visualization of real-time tweets takes place.
My objective is to create a solution that can put together the frequency of Twitter’s words with the alteration of the stock market. The system should be able to interact with the servers, be able to structure the received data (based on the flexible configurable parameters), and to transform the data stream to a data frame on which data mining models may be fitted. The use of Ridge and Lasso Regressions leads to such results that may encourage further research.