Pair matching and searching in text databases

OData support
Dr. Szűcs Gábor
Department of Telecommunications and Media Informatics

The processing of data is gaining momentum in terms of drawing conclusions, retrieving information, or even redundancy. Processing structured data is now a simple task for computers, but processing unstructured text can still cause difficulties.

During my work I studied free textual data sets, the task divided into two parts, and in the first part I created a classification model for a song text data set, which assigns artists to their song lyrics processed with text mining tools. The other part of the task was two sets of data: stock names and their abbreviations, and the popular Quora website's questions, where in both cases they were in pairs, and in which the model had to decide whether two pieces of the textual data set match or not. During my work, I used various tools, including RapidMiner Studio, RStudio, PyCharm and Microsoft Excel. In the first chapter I present the basics of data and text mining and the theory of classification, in the second the preparation of the data, then the model itself, finally a summary of my results.


