Data mining is one of the fastest growing field of information science. The rapid growth of raw data makes automated processing and relation discovering mostly needed in big databases. Even more complicated area is the dynamic processing and interpretation of the limitless amount of text based document on the World Wide Web. The analysis and translation from written text to mathematical code, with keeping the relations intact, is one of the most researched area of computer science. Data mining and text mining works together in finding the connections.
In my thesis I was looking for the answer for the following question: is there any correlation between the data of a person’s private mails and the reply time of the given person in one conversation? I used data and text mining analytic methods to find the answer.
During my work, I developed a simple IMAP email client. With that, I downloaded the person’s mails for another program created by me. That program analysed it, made statistics, and with different preparation processes, it made a data set, which is best used for the demonstration purposes of the original question.
With the help of Rapidminer software, I tested and compared different data mining methods. During my research I was looking for the most efficient classifier algorithm, I gained insight into the vast field of data and text mining.