Drug-target interaction prediction using large-scale chemoinformatic graph databases

OData support
Dr. Antal Péter
Department of Measurement and Information Systems

Different prediction methods and data structures are presented in my thesis. The predic-tion based on similarities from graph topological data can be done efficiently nowadays. Implementations of these methods are carried out with the evaluation of the results in my project.

An article about graph topological prediction methods (published this year, in 2017) [1] was used as a base of my study. The designs of implementations of these methods are presented in the article. Main goal of this thesis is the reimplementation of the methods as well as the reproduction and evaluation of the results achieved in the given article. The results were successfully reproduced mostly, and in some of the cases further re-sults, which were not included in the article, were achieved and are presented in the evaluation.

Predictions were performed on bioinformatics data looking for drug-protein interactions. Depending on the used methods, not all the data had to be taken into account. The rea-son for that will be presented later on in the thesis in details.

To gain further knowledge about semantic data structures, the topic was studied focus-ing on how the prediction of these interactions could be done in an efficient way using these databases.

The implementation was done by R programming language, focusing on the reusability of the code. Based on the above mentioned article, four graph topological prediction methods based on similarities were implemented. The evaluation was done by using var-ious packages available for R and the bioinformatics software, Cytoscape for graph vis-ualisation. The results from different methods were compared with each other as well as with results from different articles.

As a conclusion of the evaluation, the reproductions of these graph topological predic-tion methods were successful with reasonable results and short runtime. The accuracy and efficiency of these graph topological methods are certainly far beyond of those complex and precise algorithms using huge amount of data which are easily affordable in pharmaceutical industry and require extremely long runtime and consequently ad-vanced hardware. Therefore, it is not recommended to use these methods in pharmaceu-tical companies to predict the drug-protein interactions. In other fields where less amount of data is available about the drug, these methods could be still good alterna-tives in the research.


Please sign in to download the files of this thesis.