When talking about storage, distributed systems come up more frequently these days. The topic grows bigger and bigger each day, while the path is not trod yet. Enterprises working with huge amount of data around the globe showed us, that the future of storage should be on this road.
Planning and building a distributed infrastructure is a lot of hard work. Running performance and stability tests on each possible software solution is not a way to go. Good decisions have to be made in relatively short amount of time.
My thesis tries to help system designers through the planning process of setting up a new cloud storage infrastructure. To do that, I got together the bigger, prevailing file systems and compared them. I read a lot of websites and documentations, so I could write down the descriptions about the technologies found in this work.
Instead of marketing slogans and features mentioned and promised in the documentations, engineers like to see numbers and performance tests. It does not matter if a software has a pretty looking interface, if it can not get the job done. To truly compare the distributed file systems, I measured them and their configurations with predetermined performance tests.
The backbone of the whole thing is a custom built framework. With the help of that the computers in the laboratory can be controlled and managed as one, from a single point. Along with different interesting tasks and processes, this framework can create a cloud from the specified technology. That way, I could concentrate on the actual performance tests – after implementing the framework of course.
The framework is written in BASH and builds on existing technologies – used in enterprise environments – that stood the test of time. With a little investment it can be expanded and customized, thus executing other tasks and solving different problems.