The strong waves of technology of the 21st century led to the exponentially increasing speed of data generated by users. The amount of data generated every day all around the world has grown incredibly, Moore’s law is not valid any longer. These days in the data-driven world the concept of Big Data is well known for every specialist within this field of study. Besides volume, velocity and variety, thus the 3 Vs, are determinative keywords when it comes to data.
With the rise of web 2.0 and social networks the possibility came for the users to share their thoughts in various ways and channels thus to contribute to all 3 Vs. The volume of data exceeded its boundaries, it can no longer be proceeded by human work, the processing needs to be automated. Business Intelligence and data mining provides solution for understanding and making market advantage from the data a business has. Text mining has not only become a key to success but something essential to do for companies, especially for those whose core concept is based on user-generated content, otherwise they are out of the market competition.
My thesis highlights the significance of text mining of user-generated content and the current situation of this field of study and also lists challenges text mining faces with these days. The document also shows a frame model about how the business intelligence projects are done as a best practice.
Besides introducing some ready-made standard software solutions I also give an insight to analytical tools which enables users to create their own models to mine data. My thesis work includes a review on the current market situation of these tools.
The models, which I built demonstrating how sentiment analysis is done, are done on movie reviews given by users of Rotten Tomato movie database with two of the analytical tools I introduced. I explain in details how the models work before evaluating them by different measures and comparing the results of these processes.