In the last decade it became evident that common, signature-based antivirus programs can not keep up with continuously renewable malware techniques and as a result, they can not recognize new variants of known malware families without updating their signature database. Even the increasingly complex heuristics can only help to a limited extent. Because of this, solving the malware detection problem requires new approaches.
In this work I use artificial intelligence to solve this problem. Based on the scientific results of the last couple of years, I identified, examined, implemented, improved and tested some algorithms that could categorize the malware input into families with success.
First of all, I highlight the difference between static and dinamic analysis, I give an overview of the most recent static analysis techniques, the metrics commonly used for classification and the best known classification systems. That is followed by a careful review of the state-of-the-art, which systematizes the results of the last 5-10 years and I selected the best algorithms based on certain considerations. I examined three of these and after some improvement I also implemented them. I carried out the development in C/C++, because in this language, the neccesary Windows API or UNIX functions for binary analysis can be used easily.
In the last section of this report, I tested my programs with the same set of malware samples, which I obtained from the repository of Crysys Lab. Based on the metrics I introduced in this work, I chose the algorithm which made the best results.