Pattern matching algorithms

OData support
Dr. Dudás Ákos
Department of Automation and Applied Informatics

The simple definition of the exact string matching problem, that is, we would like to find every occurrence of a pattern in a given text. The performance of an exact string matching algorithm is crucial, since in many cases operating on large amounts of data is inevitable. These algorithms have many uses in a variety of fields like computational molecular biology and search engines.

The exact pattern matching generally have two types of implementation. First, when we match our pattern directly to the input text. The second type is when we can preprocess the original text to facilitate multiple consecutive queries. That is to say, we create a data structure from the original text. The most well-knows data structures are the suffix arrays and the suffix trees.

Parallel computing is a type of a computation where large problems can be divided into smaller ones, which can be solved at the same time. Frequency scaling was the dominant reason for improvements in computer performance in the past, but nowadays the parallelization of sequential programs is much more popular.

The question emerges: are the pattern matching algorithms suited for parallelization, under what conditions can we do it and how does it affect their running time.

In my thesis I present pattern matching algorithms based on suffix arrays, and I present their way of parallelization in different environments.


Please sign in to download the files of this thesis.