Generation and optimization of summary tables for ad-hoc data mart queries in Oracle 11g

OData support
Supervisor:
Dr. Martinek Péter
Department of Electronics Technology

State of the art data warehouses store very large amounts of data (typically several terabytes) in the form of numerous complex objects. To minimize query time, the data stored in data warehouses are typically made available to business users through the data mart layer having a special, query-optimized, star or snowflake structure.

One of the most difficult problems of constructing data warehouses is data mart optimization, because of the huge amount of data, the frequent use of aggregate operations (e.g. according to months, regions or product categories) and the short query response time expected by the end-users, which is generally no more than several minutes.

A very efficient but complex and controversial way of optimization is using summary tables. Summary tables store the results of common aggregate queries (e.g. daily reports), and their contents is refreshed on a regular basis (e.g. daily).

In advanced database management systems the SQL engine is able to automatically rewrite queries referencing specific base tables into queries above the summary tables.

As the size of a summary table is usually orders of magnitudes smaller than that of the base table, furthermore it may contain not only aggregates but also the results of complex joins, query rewrite can result a dramatic increase in speed.

Hence to increase efficiency a high price have to be paid, especially in the case of ad-hoc queries, which are typical in data marts, because there is no information about their exact nature in the system design phase. The main problem of summary tables generated using simple methods, is their large number, large overall size and large maintenance resource requirements, which may put an intolerable burden on the datamart system.

In my work I present basic concepts and basic techniques of summary tables and query rewrite by analyzing the advantages and pitfalls of this method.

Based on available documentation existing summary table management tools and methods of Oracle 11g will be reviewed. I examine both the advantages and the limitations of these tools.

The goal my work is to study existing optimization techniques and to create new solutions for minimizing the number and size of generated summary tables, while minimizing summary table refresh times and maximizing estimated query execution speed. The completed solution is tested on given groups of queries, and my results are compared to the results of the corresponding tools of Oracle.

Downloads

Please sign in to download the files of this thesis.