Stack trace based similarity search of software error conditions

OData support
Supervisor:
Marton József Ernő
Department of Telecommunications and Media Informatics

Nowadays there are more and more people who are engaged in application development which results in more and more errors during the coding and application running phase. There are several runtime environments that are made to notify the user or the developer if an error occurs.

The Java Virtual Machine (JVM) is one of these runtime environments that can help to identify and locate an occurring error by a stack trace. A stack trace contains several information about the problem, such as the exception type, the massage relating to the exception and the call stack.

During my thesis work, I was looking for a solution to classify stack traces by the effect. It is an important topic because it could provide developers with a rather easy way to find a solution for the problems due to the fact that a specific knowledge-base based search would result in more relevant solution hits compared to a common web based search.

In my study, I examined three possible classifications. Two of them based on unsupervised machine learning, and the third based on rules I created. Both of the former possibilities were based on K-means based clusters, the only difference between the two was how to create the vectors for the clustering. The 3rd possibility was based on the call stacks grouping based on prefixes.

Based on my findings we can conclude that the unsupervised machine learning based method could only reach 25% accuracy on the validation sample set. On the other hand, the rule-based classification reached 75% accuracy. During the examination of the results, I found that the rule-based clustering achieved more thanks to the good rule-system I had created during stack trace similarity research. This rule-system was able to find the similarities between stack traces more effectively compared to machine-learning clustering.

All in all, the rule-based classification can be used more effectively for stack trace-based similarity search. This classification is just a proof of concept, that has additional improvement options that worth to think about.

Downloads

Please sign in to download the files of this thesis.