Multidimensional graph analytics

OData support
Marton József Ernő
Department of Telecommunications and Media Informatics

Graphs and graph algorithms are used in many areas of computer science. Applications of graph algorithms include routing in computer networks, scheduling processes in operating systems, and they also play a major role in logistics planning. Furthermore, they solve several problems that appear in other fields such as biology and social sciences.

Network science, which is a relatively recent area of research, targets the in-depth topological analysis of graphs. Several metrics are defined to characterize graphs and to provide a better understanding of the underlying systems represented by these graphs. Traditionally, network science studies exclusively homogeneous, untyped graphs and does not distinguish between different types of nodes and edges. There is significantly less research that involves multidimensional graphs (also known as multiplex graphs, heterogeneous networks). These graphs can contain different types of edges, which makes it possible to represent more complex systems of semantic connections and to conduct a deeper analysis of the given graph.

A class of metrics aim to capture the clusteredness of graphs, which necessitates counting the triangles in the graph. This is a resource-intensive operation and does not scale well even in case of homogeneous graphs, and introducing types brings along additional complexity. One possible solution to this problem is to use the adjacency matrix representation of the graphs and define metrics using matrix operations.

In my thesis, my goal is to implement a set of multiplex graph metrics and where needed, make the computation of some of them more efficient. For this, I implemented them using matrix operations on the adjacency matrices of these graphs and evaluated them on graph datasets published by investigative journalists (such as the Panama or Paradise papers). In the calculation of metrics describing the clusteredness in a graph, I achieved a characteristic improvement in scalability compared to the naive approach. For a graph with approx. 450 000 nodes, the computation of one of these metrics was approx. 15 times faster using the linear algebraic approach.


Please sign in to download the files of this thesis.