IT services and their management are facing increasing performance and security requirements. To meet these demands more and more sophistication is being expected from IT service and infrastructure management solutions. On one hand the size and the complexity of IT systems is increasing, while at the same time many conventional approaches (e.g. process-centric management best practices and supervisory configurations based on sole expertise) become less and less usable. In order to design the 'Quality of Service' – QoS - of a system, sufficiently accurate models should be created about the relevant aspects of that system.
- exposes the main tasks of the infrastructure management and the ‘big data’ problems which come from these issues. It illustrates the motivation of this essay with a practical example and also answers the question, why is it necessary to find a new way to handle and process these data
- briefly introduces the usage and the architecture of the R programming language and summarizes its parallelization opportunities, especially the solutions of using R in computing clouds
- presents the MapReduce architecture and one of its implementations, the Apache Hadoop framework. Finally, it introduces two important R packages – RHIPE and RHadoop – which provide cooperation between R and Hadoop
- explains the main topics of the modeling and the steps of the data processing; exposes the most commonly used tools of modeling and automated data processing, including KNIME. It introduces a solution for the integration of KNIME and R
- presents the techniques and the difficulties to integrating the earlier introduced Apache Hadoop, R and KNIME applications in a private environment and on Amazon EC2 bases