Topological Data Analysis

OData support
Dr. Kósa Zsuzsanna Mária
Department of Telecommunications and Media Informatics

For inferring information from data, we often visualise it to discover patterns, and rules in it. This is easy in lower dimensions, but visualising high dimensional data is not trivial. There are numerous methods for reducing the dimension of data (like PCA, Isomap, Laplacian Eigenmaps, etc.), to help visualise, and understand the data. Topological Data Analysis aims for the same goal, but with different method. The above mentioned methods all asume certain properties from the data. For example PCA assumes that the data is distributed along axis, Isomap, and locally linear embedding, try to find a non-linear map from the data to a Euclidean space which preserves it’s distance function. Topological Data Analysis algorithms (like the Mapper, or persistent homology) assume less structure. These algorithms transform the data into a low dimensional representation, for example a simplex, which represent the connectivity property of the data. These simplicies, can usually visualized on a two dimensional plot. This feature also helps to understand the underlying structure of the data.

The goal of my thesis was two fold. First I researched the topological data analysis, to understand the underlying theory, and the methods, and algorithms used to calculate statistics about the data. The other part was to write an extension to the popular R ggplot2 visualization library, so the diagrams computed with the TDA library found in R (called TDA) could be visualized in a more aesthetically pleasing way. This part also included the usage of topological algorithms to analyse a real-life data, to see, if this method can find any new insight from the data.

The theoretical side of my thesis documentation is structured along the mathematical theories the serve as a foundation for topological data analysis. On the practical side, I first showcase the plotting ecosystem found in R, and describe how I developed my extension. After this I compare the plots of my extension to that of the TDA package. After this I will analyse a real-life data set. And I will conclude my thesis the observation, and insights I gathered during my work.


Please sign in to download the files of this thesis.