Anomaly Detection on Data Streams

OData support
Supervisor:
Salánki Ágnes
Department of Measurement and Information Systems

The automatic detection of abnormal data points, called anomalies, is a priority in many specialities. The main cause of these rarely occurring abnormal observations are mostly dangerous events, which cause big financial or moral damage (like financial scams or attempted computer infrastructure break ins).

In these specialities the time relation of the data points is emphasised, and real-time evaluation becomes necessary. So the method of online anomaly detection over data streams is commonly used.

Different detection algorithms have different assumptions regarding the nature of the anomalies, therefore these algorithms have different methods to identify these strange events. The goal of this thesis is to examine the typically used algorithms found in the scientific literature.

So in this thesis I will describe several anomaly detection algorithms in detail. From the several algorithms I implemented three using Python and integrated them into the Apache Storm stream processing environment. The three implemented algorithms are the distance based Exact-Storm and the cluster analysis using Korm and DenStream.

In this thesis, I will present the experimental environment I have built, which uses the Apache Storm for stream processing, uses a client side visualization library (Bokeh) to represent the results of the algorithms in real-time and uses a Python based webserver library (Web.py) as the intermediate connection of these two.

To visualize the results of the algorithms I have used scatter plot and parallel coordinate diagram.

I have produced several representative synthetic data sets and used them to compare the success of detection of the implemented algorithms. I have examined the usability of the algorithms on a live financial data set.

Downloads

Please sign in to download the files of this thesis.