Nowadays, reliability is one of the most important requirements of enterprise information systems. Storage Area Networks or SANs play an important role in maintaining high availability systems. SANs allow separation of data storage and hosts – which are the users of stored data – providing a way to handle storage failures and server failures independently. To minimize the length and frequency of service outages a broken server can be immediately replaced with another one and multiple redundant disk arrays are used to store critical data. Multiple independent network routes between the servers and the disk arrays drastically reduce the effects of network device failures.
Due to their size and complexity maintenance of SANs is never without problems. Hardware and software defects may occur on the hosts, disk arrays and network switches causing service disruptions, data loss and connectivity issues. To meet service level requirements, operators must identify the root cause of the errors and deal with them as soon as possible. Identifying error causes and understanding the effects of a repair procedure or technology change requires a high level of expertise and knowledge of the network structure and dependency relations between the nodes. Access to reliable, up to date information on the structure and current state of the system is an absolute necessity for making correct decisions.
The primary goal of this thesis is to design and implement an application (Sanalyst) capable of supporting error detection and issue resolution while providing a simple overview of the massive enterprise infrastructure. It visualizes selected subsets of SANs with logged errors in a simple graphical user interface. Users can attach text messages to the visible devices supporting communication between members of the operations team.