Improving a decision tree based time series classifier and examining expansion possibilities

OData support
Supervisor:
Gáspár Csaba
Department of Telecommunications and Media Informatics

Most of the commonly used time-series classification algorithms are instance based algorithms. These methods have some advantages but also have at least that many disadvanteges compared to model based methods. Model based time-series classifiers on the other hand are only usable efficiently if we have some prior knowledge on the problem we want to solve or are only efficient in one field of application (e.g. speech recognition).

Teh ShiftTree algorithm is a model based time-series classifier that was developed by me. It has the accuracy of the instance based methods and has every advantage that a model based method can offer and even more like (a) interpretability, (b) generality, (c) prior knowledge can be built into the model. In this thesis I introduce the ShiftTree algorithm and examine how the algorithm can be improved and be made more efficient.

I expand the operator set of the algorithm and introduce a new operator family in order to make the algorithm more accurate. I examine how the training process can be altered in a way that the accuracy may exceed the accuracy of the multiple modeling mode while the running time and the model complexity of the algorithm stays low. I evaluate the modifications on 23 different datasets and I compare them to the results of the most common instance-based algorithms. The optimal method is also compared to the results of the KDD Time Series Challange 2007 using blind tests.

I identify the bottlenecks in the training process in order to optimize the running time of the algorithm. I examine the effect of the optimization on the running time and on the scalability of the algorithm.

I also describe the experiments I have done with assambled models. I introduce two types of the model assambling: one based on the common boosting algorithm and one based on cross validation. I evaluate these assembling or forest methods on 22 different data sets. The optimal method is also compared to the results of the KDD Time Series Challange 2007 using blind tests.

I modify the algorithm in order to make it capable of using labeling confidences and I introduce a method that makes the algorithm able to adapt to the property changes of the input data without having to rebuild the whole model (on-line learning). This is certainly an important feature when using the algorithm on real-life data.

I briefly examine how the principles of the algorithm can be expanded on other fields of semi-structured data mining. I also shortly describe how the ShiftTree models can be used in stream mining.

These improvements make the ShiftTree a very efficient algorithm family in the fileds of semi-structured data mining and esspecially in the filed of time-series classification.

Downloads

Please sign in to download the files of this thesis.