Data mining has lately gained a significant role as the amount of data that need to be processed is rapidly increasing. Handling large datasets is needed in many areas including banking sector, medical appliances, vehicular industry, marine industry etc. To be able to properly process the data that are created in a fast and dynamic fashion, and to be able to make the required actions should an unexpected event occur, filtering out the anomalous behaviour plays a significant role.
Thus, one of the key pillars of data mining is anomaly detection. Data or behavioural patterns that deviates from the usual or from the expected are called anomalies. Finding these deviations are not an easy task, since their presence and type is strongly dependent on the application field and the underlying data.
During the creation of my Thesis, I gained a deeper knowledge about the CAN protocol that is used in the vehicular industry as an on-board communication protocol. After that, I designed and implemented a program that is able to filter the anomalies that are present in the messages carried over this protocol. My solution was based on the fact that some components in a vehicle are related to each other meaning that a certain level of correlation can be observed between them. Should a malicious intruder compromise the system and then modify the messages, they can destroy the relationships between the components. Therefore, those values were considered anomalous, where the correlation value was different than what was expected based on the preliminary observations.
During the implementation I also determined the level of the effectiveness of my program and the degree to which it is sensitive to the modifications of the messages.
In my Thesis, first I explain the types of anomalies, the general methods of anomaly detection, then my solution and results are introduced.