Error-tolerant post-processing of next-generation sequencing data with decision trees

OData support
Supervisor:
Marx Péter
Department of Measurement and Information Systems

The primary purpose of this study is to realize and modify multifactor dimensionality reduction algorithm, which is able to read genotypic data directly from VCF files.

First part of the thesis is introduction to Sanger method and next-generation sequencing technologies, their methods and platforms. Raw data from NGS, some standardized data formats, that are compatible for all platforms, their creation from one to another in sequence to variation workflow, and their usages will be discussed. The thesis then shows the computational modeling using multifactor dimensionality reduction algorithm and its implementation with cross-validation.

The major objective of this study, the programming part comes in next, which is about reading data from VCF file based on case-control study and identifying best models using previously mentioned algorithm. The results of the program show the best models with least error rate and different performances of classification tests in different dimensional models.

Finally, in conclusion part, performance estimation of the software and recommendations for its further development are considered.

Downloads

Please sign in to download the files of this thesis.