The computing power of GPUs of desktop workstations are often high. These graphics cards can be efficiently used for running parallel computations because of their architecture, and developing applications for GPUs has never been easier since NVIDIA's CUDA introduced to the public. There are 26 workstations in my department's student laboratory, each one is equipped with an NVIDIA GPU. All together these GPUs have 18 TFLOP/s theoretical computing power which is three times more than what the University's supercomputer has. If we can aggregate this huge amount of computing power for research and educational purposes, we could solve a lot of problems in significantly less time from a wide variety of disciplines eg. physics simulations, medical imaging, protein database searches, option pricing, etc. From the educational side we could provide opportunity for students to try developing multi-GPU applications, while we do not have to change the current hardware configurations.
During my work I learned about the basics of general purpose GPU programming, computer clusters and GPU virtualisation technologies. I examined the laboratory's infrastructure from the point of view of creating a GPGPU cluster and I measured the performance of possible bottleneck system components. Based on the analysis of use cases, measurements and software components currently available I planned and implemented a possible solution for creating a cluster with GPU virtualisation.
This paper consists of three main part. In the first part I present the development of general purpose GPU computing, the CUDA programming model and the current technologies for programming these devices. The second part is all about computer clusters and GPU virtualisation. I present the advantages and disadvantages using GPU virtualisation in HPC clusters, and in the last part I engineer a possible solution for organising the laboratory's computers into a GPGPU cluster. I write about the special needs arising from the different use cases and academic environment, security considerations and last but not least I write about the possibilities of further development of the implemented system.