Predicting the lifetime of hard disk drives by machine learning methods

OData support
Dr. Horváth Gábor
Department of Networked Systems and Services

This Thesis is about the analysis of the S.M.A.R.T. diagnostic data of hard disk drives,

and the prediction of the remaining lifetime of the drives, using this data.

In the Information age, the amount of stored information is growing rapidly and we are

storing the majority of this data on traditional hard disk drives. Because of this, the failure

of hard disk drives is a problem which is hard to eliminate for cloud service providers. The

hard disk drives are equipped with a so called S.M.A.R.T. diagnostic technology for quite

some time, which enables us to read different diagnostic attributes. If large amount of this

data is available, we can predict failures at a certain level, by analyzing this data.

In this Thesis the aim is to find a solution to this problem. The Backblaze cloud service

provider collected and published 5 years of S.M.A.R.T. data, which I analyze to build

machine learning models, which can predict the time to failure for the drives.

In the Thesis I present the technology of machine learning and the structure of the data

being used. I remove the unimformative variables/observations from the data, then I

perform statistical analysis to choose a subset of the remaining data, which can be used

for machine learning. I present how the data can be transformed to make it usable for

machine learning and I present the creation of learning-, validation- and test sets.

After preparing the data, I present the machine learning algorithms I use. I create more

models for each algorithm, which differ in the length of the time period known for the

model. I train these models and describe their results by evaluating the models with the

test set. Then I showcase a few problems and interesting facts encountered while working

on the Thesis.

I conclude the Thesis by summarising and evaluating the project results and I suggest a

few interesting opportunities for future work.


Please sign in to download the files of this thesis.