Nowadays we have to store and process more and more data. As a consequence, more attention is being paid to data mining, which is to extract the needed data from a larger amount. That’s why I choose anomaly detection, because it helps to find outsider cases, which could mean cheating. The essence of this method is to filter out cases that are significantly different from the average elements of the data set. With this method we can not achieve 100% accuracy, but we can safely filter suspected cases with appropriate clauses. At the end of the method, we need an expert, who will decide whether the suspected case is really a fraud or not.
In my thesis, I will first describe the different anomaly detection strategies, then I will choose and realize some of them, and compare their results to each other. For my work, I used installation data from Telia Carrier Hungary Ltd., but of course I modified the data, because I don’t want to make trouble to my company.
For data clustering, I used Juppyter Notebook, which uses Python. That’s important, because this language has a big part in Big Data world, which spreading fast, so also Python will be increasingly needed.