Analysing bottlenecks on distributed filesystems

OData support
Dr. Szatmári Zoltán
Department of Measurement and Information Systems

Data storage provides new challenges for system administrators, as the size of unstructured data outgrows disk sizes. The scalability of traditional storage is not sufficient from now on, data distribution between storages is necessary.

Distributed file systems provide a possible solution for this purpose. With these software products a cluster of several servers as a consolidated storage, where data is available in a single name space.

The world of distributed file systems is not an easy world to navigate, as there are many different implementations, and it lacks comprehensive analysis. The users of distributed file systems choose this solution in favour of scalability, however, it is difficult to judge how well will the system perform in larger scale because of the lack of deep benchmarks.

My goal was to create an automated testing framework, that performs resource-based measurements on distributed file systems. This framework sets different resource limits for the tests, so the results can reveal not only the potential bottlenecks, but also the effectiveness of each file system regarding a certain resource.

As the base of the framework the Ansible cluster management system was used. The system has extensible sets of distributed file systems, resource constraints, replication settings and measurement methods. The testing process carries out the measurements for the file systems with all set-up and resource limitations with all the methods available.

My measurements included MooseFS, GlusterFS and XtreemFS using one, two and three level data replication. Sequential read and write were run on one or two clients simultaneously. To measure the performance of metadata, I used a test involving the creation of many small files. The processing capacity, the perfomance of the storage devices and the network connection speed was also limited one by one.

As a result of this method, I could conclude the performance characteristics of the file systems to specific resources. For example, the XtreemFS write performance lagged behind the other two file systems, but by limiting the resources, it has become apparent that the main bottleneck is the processing capacity. Furthermore, with the replication turned on, the network connection becomes another weak point.

Based on these results, I came to the conclusion, that the resource-based testing can be a great aid in the analysis of distributed file systems. The software I made is suitable for carrying out resource-based measurements.


Please sign in to download the files of this thesis.