Handling of missing genotype data

Dr. Hullám Gábor István
Department of Measurement and Information Systems

Genotyping is a complex and expensive progress, which is likely to make faults while sequencing genetic data. Imputing genetic data with means of statistical inference and artificial intelligence is a cheap solution without new measurements of the genom.

In my thesis I introduce the method of IMPUTE version 2, a software that presents an effective way to impute unphased and missing genotypes. I examine the effectiveness of the program from more aspects. I imputed datasets with 12 different ratio of missing data and analyzed the result datasets observing the minor allele frequency of the SNPs. I made a comparison with results created by a selfmade program that has an imputing method cocerning allele distributions.

IMPUTE v2 was more effective in every setup of the imputation than the self-created method irrespectively of the ratio of missing data.

The examination of measurements and creating usable file formats containing missing data are done with selfmade softwares. One software was created for executing IMPUTE v2 automatically on Condor system.


