Thanks to the exponentially growing size of produced data nowadays big data is becoming hotter and hotter topic, the meaning of which is not only known among professionals but it is also considered and used as key element. We can mention as an example the 2,5 billion content items which are shared on Facebook on a daily basis (status updates, wall posts, photos, videos and comments). Their analysis and storage is a great challenge. We can judge the size of data by taking into consideration the fact that 100+ petabytes HDFS clusters are used only for storage. In addition, people’s eagerness to use more and more gadgets contributes to the growing number of data waiting for processing and storage.
As an outcome there is a greater and greater demand for computation capacity. So we should put more and more computers into operation every day and then operate them economically.
Automation plays a key role in terms of speed, consistency and repeatability regardless of the number of used machines. The infrastructure may be comprised of a few tens of servers or ten thousand servers.
This document is intended to assess what kinds of modern configuration management tools are currently available and then demonstrate how they work on an arbitrarily chosen tool in a highly distributed environment with the help of such big data tools as Hadoop, Storm or Zookeeper.