As a result of the rapid growth in computing power in recent years, more and more intellectually demanding tasks can be automatized. It has become possible to run numerous computationally expensive algorithms that are characteristic for the fields of artificial intelligence, machine learning and text mining. I apply such methods in my thesis for the problem of medical coding.
The health care system collects statistical data about applied treatments and treated diseases in order to reimburse health care providers and to monitor public health (e.g. epidemics). In such statistics, diseases are represented using formal code systems. These systems assign codes in a non-trivial way, so that the manual coding of diagnoses and medical reports becomes a tedious, time and money consuming task. In my thesis, I examine the possibility of partially automatizing this process by creating a system for helping the coding personal in finding the correct code faster. The system returns a list of the most relevant codes for a given diagnosis. In the future, it would be advantageous to introduce such a central service with a web interface in Hungary. I developed my system so that it may become the basis of that and created a proof-of-concept web interface for the program.
After a general introduction, I explore the previous international works and solutions on the topic. Following a schematic explanation of the task, I present the concrete classification algorithms that I used (vector space, naïve Bayes, perceptron, support vector machine). After that, I investigate the possibility of mixing the results of multiple classifiers thus creating hybrid models. Then I explain my concrete implementation, present the implemented classification framework system and the proof-of-concept web interface. I test my system in detail using Hungarian and German sample sets coded according to the ICD system and examine the effects of sample preprocessing on the results. At the end, I evaluate the solution and look into further development possibilities.