The business area described in this thesis is money lending, a business carried out by specific financial institutions and regulated by financial laws. During the process of providing a credit, the financial institution places a certain part of its liquid assets to its customers, who qualify in a regulated credit rating - at a charge of an interest. Financial institutions use credit scoring, a rating of the customers, in order to maximize the ratio of repaid loans, a process where the credit application is judged based on the capacity of the debtor to repay the credit.
Credit scoring is a long established data mining problem, where the demographic and social data of the customers, additional information related to the financial product is used to predict the probability of default.
The data mining task of the thesis was to create a credit scoring model based on a real data set of a bank. During modelling and implementation, the customers data, concerning its behavior was used. Models with adequate performance can be created based on this data, that – in a certain environment – can predict the probability of defaulting on loans, based on the behavior of the customers.
The data mining problem that was defined, was solved using two approaches: a traditional and a complex one. In the traditional approach, a data set with a structure of 1 customer - 1 record was used to generate the modells using decision trees, whereas in the complex approach, time series with a structure of 1 customer - n records were compared and analysed with the k-nearest neighbor algorithm.
Based on the modelling results it can be stated that most of the created models can be used to predict defaults on charge accounts. The decision tree models, based on the approach itself, defined single rules with high probability, in the case of the complex approach using time series based models the specifics of the time series data can provide valuable information for the decision making processes.
Finally it can be stated that the usage of customer behavior data is suggested – according to the modelling results – when modelling defaults.