Efficient graph query techniques

OData support
Supervisor:
Szárnyas Gábor
Department of Measurement and Information Systems

In the last decade, numerous database management systems were developed under the umbrella of NoSQL techniques. One group of these systems is the family of graph databases, which allows users to store and query their data as graphs. This data model is often a better fit to represent strongly interlinked data sets than the traditional relational model, and its conciseness can lead to better performance. That said, relational databases have been developed and optimized for almost 50 years, and it is an open question whether efficient processing of graph data requires specialized databases at all.

In many use cases of graph databases -- financial fraud detection, recommendation engines -- the graph queries are complex, and their repeated, efficient executions are crucial for these use cases. Furthermore, graph databases are increasingly used for software model validation and source code analysis. These application scenarios could greatly benefit from an incremental graph query engine that allows users to register views and maintain their state upon on changes.

Comparing the performance of database systems requires standard benchmarks. For relational databases, this has been fulfilled by the benchmarks of the Transaction Processing Performance Council for more than two decades. Due to the relative immaturity of graph databases, there is only a limited number of benchmarks available for graph query workloads. To help establishing a standard benchmark, I joined the development of the LDBC (Linked Data Benchmark Council) Social Network Benchmark. I reworked and significantly improved existing implementations of the benchmark, and also implemented the queries in the SPARQL language for semantic databases. I performed a thorough evaluation and detailed analysis of database systems using various data models (relational, graph, and semantic).

Related to incremental graph query processing, I studied the timely and differential dataflow programming paradigms. I demonstrated their applicability by implementing a solution for the 2018 Transformation Tool Contest’s live challenge using the Naiad differential dataflow software library and compared its performance to other solutions.

Downloads

Please sign in to download the files of this thesis.