Multidimensional data management and bitmap indexing

OData support
Supervisor:
Dr. Gajdos Sándor
Department of Telecommunications and Media Informatics

Processing huge volumes of data provides quite a challenge for engineers and analysts alike. According to IBM, 2.5 exabytes of data is created everyday---so much that the last two years account for 90% of all existing data. Leading information technology research and advisory company Gartner added ''Big Data'' to its hype cycle for emerging technologies in August 2011.

Analytic database systems often have to store incoming data streams near real time while also answering to queries in seconds. Therefore, quick and efficient indexing of data is a must. In-memory computing helps achieving these goals by providing an expensive but rapid storage layer.

In my thesis work, I focus on testing multidimensional in-memory analytic databases and creating index structures for swift data retrieval.

In the first half of the thesis work I briefly introduce the concept of dimensional modelling and OLAP systems. I show the necessity of using synthetical test data and present a command line tool for generating datasets of arbitrary size. The generated datasets are later used to review three unique OLAP systems, icCube, Palo and KÜRT Co.'s Colap study.

In the second half I discuss the core problems of multidimensional indexing and bitmap indices. I discuss the available state of the art algorithms for compressing bitmap indices and implement on-the-fly versions of two. I compared the compressed and uncompressed representations with detailed measurements.

In conclusion, I summarized my experiences gained in the research and implementation process and outlined the future of extreme data management.

A glossary clarifies the definitions and technologies used. The appendix contains the grammar for the data generation tool and the installation manual of the presented libraries and softwares.

Downloads

Please sign in to download the files of this thesis.