Investigation of Site Reliability Engineering in Cloud Computing

OData support
Supervisor:
Tóth László István
Department of Networked Systems and Services

In recent years, the server applications are developing, provide more services for us, while increasing their uptime. For this, IT companies need more and more people, to keep it up with the competition. In my thesis I would like to show, why it is necessary for each enterprise environment to monitor their services and gain higher level SLA. I will introduce more monitoring system, such as Zabbix, Nagios, OpTier, Munin, SolarWinds and Pingdom.

My chosen job is a website monitoring. The application verifies that this site is available and how much response time required. If a website is not available, it gives a specific alarm, which can help to find the problem’s root. The aim is that we become aware of the error sooner, than the clients, maybe even fix them. This requires proactive monitoring, which focuses on prevention. I chose the free version of Nagios, because it came with several built-in plugins. We also made some system special checks in Python and bash, and created some in Pingdom, for the E2E checks.

Downloads

Please sign in to download the files of this thesis.