With the increasing popularity of information systems, the provided services are expected to work at 24 hour a day, without a single glitch. Of course, these systems also need to be maintained, and even with proper maintenance, one cannot eliminate the chance of hardware failures. With the spread of virtualization technologies, these breakdowns does not affect the virtualized operating system directly, and said operating system is no longer dependent directly on the physical hardware. Therefore, in the case of a system breakdown, the so called guest machine can be restarted on another physical machine, lowering the down time of the service.
This BSc thesis is about an automated service, and the related framework, which is capable of discovering node breakdowns, and with the migration of guest machines trough nodes, it increases the availability of the provided services.
When presenting the system, we will describe the architecture of the operating system, and the Network, taking into consideration to avoid single point of failures. To achieve it, a special link aggregation implementation will be presented.
The cluster will manage a Xen virtualization hypervisor, whose guest machines will use a DRBD block device as a storage. The DRBD resources will be synchronized between two nodes.
The 3 node cluster will be managed by the Pacemaker package. The communication between the nodes will be served by Corosync, and the Open Cluster Framework included Resource Agents will create the connection between the guest machines of the Xen hypervisor, and the local resource manager of Pacemaker.