In certain cases, a customer who implements a Big Data environment, cannot afford to completely move to the cloud. Either because of the data security or the lack of resources, it always results in an on-premises cluster. If the size of the cluster is based on everyday use, solving problems that insist large resources, will not be efficient on the cluster, or the primary job will be overshadowed. In these kinds of cases the most subservient way is to execute the rarely running, highly performing job in the cloud. If this works automatically, with a stabile architecture, then we can spare an enormous amount of money and human resource. A great example for this problem would be adaptive streaming analytics, which’s model’s recomputing requires a lot of resources. However, as we do not have this kind of solution in the field, this became the topic of my thesis.
Firstly, I had to get to know similar solutions. After that, I began to plan the architecture piece by piece, it was clear what kind of jobs were needed to be done, so I looked for the adequate softwares. When I found the right softwares, I examined them to find out which one is the most appropriate to deal with the task besides working together with the other ones. After I composed the scheme of the architecture and chose the softwares, I built the on-premises cluster and a reference cluster in the cloud and with a script I connected them. Then I installed the components and configured them.
To finish the task, I had to find a work where I can introduce the peculiarities of the problem. During my search, I struck upon the stream of the meetup.com which was a great tool to demonstrate the solution of the problem in progress. For this, I wrote softwares that eventually combined into a workflow.
With the work, I could test the architecture and make my conclusions. After a successful test, taking into consideration the performance and costs I compared it to other, more known architectures. After the comparison I made the conclusion, that the architecture, designed by me, outshines the conventional ones both in efficiency and in cost-effectiveness.