While working on this thesis, my aim was to demonstrate the related methods of data mining and to specify and develop a software that is able to find hidden connections in a database that contains folksong sheets.
The chosen attribute is the rhythm of the songs, which needs us to have well-defined rhythmic metrics. With these (or more precisely with the vectors representing these rhythms) we can determine rhythmic distances between songs, which are the underlying metrics for the hidden connection seeker clustering (partitioning objects into groups in such way that elements belonging to the same group are more similar to each other than to those in other groups) algorithms.
The data mining tasks, including clustering analysis strongly depend on the data they examine. There is no method which works well generally for each dataset so we have to study the specific properties of the field we would like to study (music theory and systematization), and also this is the reason we present more algorithms that can solve the problem.
Investigating the common properties of the elements that belong to same group can reveal the reason why these objects got to the same cluster. The clusters can be comprehensible using graphical methods that are based on the distance between the elements of the given cluster.
To be sure that we properly interpret the results, we have to compare them with the results of the former analysis based on melodic chains. The long-term aim of the project that I would like to add more with my thesis is to discover hidden relationships in the folk music heritage of the world. To hit this aim it is necessary to study the database in every aspect.
The database used in this thesis contains data from vocal and instrumental folksong collections of the Transylvanian region, Szeklerland. This means a few thousands of special encoded sheet music, which is enough to test and examine the algorithms and can also be easily extended in the future.