Segmentation is one of the most important and mostly researched field of image processing, furthermore this is the most basic part of object recognition. Objects in an image can be divided by the help of the segmentation. Due to the complexity of the task, present-day algorithms can perform useful results either in just limited circumstances or just extremely slowly. Researches are conducted in two main directions. On the one hand new algorithms are searched, on the other hand existing algorithms are being speeded up. In accordance with the latter direction, in the course of my thesis, I attempted to substantially reduce the processing time of a robust segmenting algorithm. I have chosen the mean shift segmentation algorithm, because it can be widely applied and apriori information of the environment is not necessary to its execution, however, its disadvantage is the slow functioning. According to today’s trends, I have chosen parallelisation as an option to speed up the algorithm. Among parallel architectures the graphics processing unit, utilized for general purpose processing, seemed to be the best solution, because it is a relatively cheap device which has much more process units than the present-day central processing units and the utilization of this architect has a great potential. In the first part of the thesis I will introduce the reader to the fundamentals of the theory of parallel programming and acquaint the reader with the structure of graphics processing units, as well as with the CUDA environment. In the second part I will present the process of planning of the completed algorithm and lastly the achieved results, comparing the serial and parallel algorithms. It turned out during the testing that the parallel algorithm meets the requirements, since it results in a significantly faster processing than the serial algorithm, even though there are further opportunities to speed up or improve.