The exploding growth of the internet and the data generated there shows the beginning of a revolution that will affect every aspect of our life. Data is growing faster than ever before, and due to statistics by the time of 2020, the amount of data can reach 44 zettabyte – which is 44 billion gigabyte if we convert it. In consideration of this, the penetration of data mining will be significant too.
Nowadays, in the field of sports, the predictions based on data are getting more and more attention. The possible usage is very diverse, these can help when one examines the personal performance, when the team is put together and when the player’s health risk is being minimised. Sports data – especially elements of basketball events – is recorded for a long time now with precise methods and equipments, and the outcome prediction of each match has always been the center of attention, consequently the subject of my thesis is an established problem in the field of data mining.
In the first part of my thesis I study the whole process of data mining as the science of knowledge discovery. In the course of my research I pay attention to the classification by the aim of the process, and to the most frequently used elements of it. Furthermore, I describe the steps of the data mining, the most common standards of the process, the main scope of it, and the tools I choose to apply in my task. As a result of my work I create a model that is capable of estimating the outcome of basketball matches. In the second part of my thesis I write down the actual steps of my data mining from the data acquisition to the result. Finally I check the accuracy of the result on real life sport betting data.