Storing and analyzing trace logs with Hadoop

OData support
Prekopcsák Zoltán
Department of Telecommunications and Media Informatics

Software error analysis in prodution environment is a cumbersome task. Most of the times attaching debuggers to the deployed running application cannot be carried out. If that happens, users and other software components would be unable to use the services published by the software, since debuggers block the operation of running software.

The best option to reveal the causes of software malfunction is the inspection of log messages that are ubiquitously used by software developers to monitor running applications. It is the programmers responsibility to place log statements in the source code, thus the information contained by the log messages reflects what was concidered relevant by the programmer by the time of constructing the source code. But what is important for the programmer may not be beneficial at the time of bug tracking.

There are two main problems with log messages: the containing information may not be satisfactory, and their amount may not be enough to formulate a diagnosis when investigating software error. However, quality log messages are a necessity, since malfunction most of the times come from bad data, that is generated on the production system, but never present in well-defined test environments. We can make these data visible via adequate log messages.

In the beginning of my thesis, I demonstrate that the amount and quality of log messages can be improved in Java programming environment by generating log statements in the application compile-time. I introduce the softwares and technical solutions I use throughout the paper. Then, I give a detailed description of how I designed a system, that is suitable to transport and persistently save huge amounts of log data from their soure to Hadoop filesystem; at the end, this system shall also be able to run analytic software on the saved log messages. At the end of this paper I test the deployed setup and measure the throughput, storage capacity and time of analysis.

Overall the designed and deployed system can be the basis of software telemetry in production environments that give insight to running software and make tracking down the causes of software malfunction easier.


Please sign in to download the files of this thesis.