Nowadays most people are used to searching on and uploading to the Internet. But the amount of data produced by billions of people can be so enormous that we cannot process it on one computer. We call this challenge Big Data.
There are several computational models for processing these huge datasets. One of the most popular ones is the Pregel model developed by Google. Apache Flink is a Big Data technology that provides distributed computation. Flink Gelly is a graph-processing library based on the Pregel model.
If we want to implement such a distributed algorithm to analyse Big Data, we have to pay special attention to correctness and efficiency, as both non-locality and the sheer size of the data make verifying and debugging programs difficult.
In my thesis I examine a graph-like cryptographic data collection called ROSCO provided by BME CrySys Lab. ROSCO represents the relationships between signed software artifacts and code-signing certificates. If we search for directed cycles in the graph, we could find nodes that mutually sign each other in the cycle, indicating a potential security problem.
For this purpose, I implemented (a) local and (b) distributed solutions to detect strongly connected components. The local solution is based on Tarjan’s algorithm while the distributed one conforms to the Pregel model. Furthermore I verified the correctness of the implementations with the help of unit tests.
The solutions were applied to different subsets as well as the entirety of the ROSCO graph as input. Finally, I performed a cursory evaluation of the obtained results.