Financial institutions have been facing to continuous threats because all over the world thousands of people attempt fraud against them. Machine learning brings a completely new perspective for Computer Scientist to detect and prevent fraudulent actions. Insert a solid fraud detection module into the specific part of the bank system can protect it against several threats. My goal was to create a module that can detect fraudulent bank transactions.
In my thesis I used publicly available banking data to create a fraud detection module based on machine learning techniques. I worked according to CRISP DM method during my project, firstly I studied the various solutions in the banking sector, after that I made exploratory data analysis on my datasets. As I defined the right machine learning model I simulated the behaviour of a banking transactional network system to validate my fraud detection module.
The Gradient Boosting Classifier was the best choice in all of the tested models. During my selection process I tested the models on unbalanced and after that on resampled datasets. The models performed much better if the training datasets were resampled, in resampling the under sampled datasets were the best training set. I used Python programming language to create the fraud detection system, the implementation of Spark Streaming gave me faster data processing and scalability opportunity. In my solution I preferred using the most popular and cutting the edge technologies from the Data Science world. By the end of the project I achieved 89% and 99% of recall on the two chosen datasets. According to the results, most of the frauds can be prevented with the usage of my module.