As a result of large-scale development in telecommunications technology, broadband internet has become an essential part of everyday life. Due to the amount of data on the internet we need automated processing methods, particularly in the world of online communication. Because of the changing tendencies in user needs and habits, the weekly or daily update of news is not sufficient: the cardinality of events and the happening rate cause constantly changing trends.
The aim of my thesis is the processing of large datasets with the help of text mining techniques and by creating a system that collects texts of news portals and then processes and categorizes them. I thoroughly document the steps of the creation of this software with theoretical background presentation and optimization possibilities exploration. Demonstrating the procedure’s operation I have chosen sources of data which provide large amounts of constantly changing text sets that have logical cohesion and easily accessible by typical web solutions.