The past thirty years have seen rapid advances in the fields of randomized algorithms and average-case complexity analysis, to the point that they are now considered some of the most important sub-fields in computational complexity theory. Both topics have their roots in probability theory, and in both cases we focus on designing and analysing algorithms that achieve good performance on the average in practical scenarios.
We will touch upon these subjects in connection with the so-called closest pair problem, in which we are given a set of points in a metric space and need to find the pair of points with minimum distance between them. This problem is of great theoretical importance, being not only one the first problems studied in the field of computational geometry (by Shamos and Bentley), but also the very first problem for which a randomized algorithm was conceived (by Rabin).
In our Thesis -- after briefly reviewing the literature on the closest pair problem -- we focus our attention to one particular randomized algorithm given by Golin, Raman, Schwarz and Smid. For the theoretical analysis of this algorithm we survey the seemingly unrelated field of random geometric graphs and adapt results to show that on certain types of inputs, the algorithm by Golin et al.~performs a number of superfluous operations. We then augment the algorithm in two ways to remedy these shortcomings. Finally, we examine the thus improved algorithm using the approach of probabilistic analysis, where we calculate the expected time of the algorithm for random inputs, more specifically, under the assumption that the input points are uniformly distributed random variables.
In contrast to the expected running time of the original algorithm given by Golin et al., which is proportional to $(3^d+2)n$ for $n$~points and $d$~dimensions, that is, exponential in the number of dimensions, the probabilistic analysis for the two improved algorithms presented here yields an expected asymptotic complexity of $(3d+6)n$ and $(d+2)n$, linear in both the number of points and the number of dimensions. While on one hand this is an interesting theoretical result, in many cases the improved algorithms also provide a way to efficiently solve the closest pair problem in practice. Simulations show that this holds not only for inputs of uniform distribution, but basically for any distribution, provided that the points are chosen independently -- thus making our results applicable to a wide variety of practical problems.