Cluster analisys is basic datamining task. The essence of the procedure is partitioning a
set of objects according to the their attributes. A slightly altered method is capable of
putting one object into many subset. Also the membership of an element in a set can be a
real number between zero and one, such as in fuzzy logic. In many cases the core of cluster
analisys is the distance function, which assigns a number to a pair of elements according
to their diversity. These distance values can form a distance matrix.
Applications of cluster analisys can be various. The first field I examined was the
databases of supermarkets. The task was to identify customer behaviour patterns based
on baskets. Another application is clustering organic molecules. Pharmaceutical companies
and chemical research laboratories need to order molecules into clusters according to their
chemical attributes. The third field was the world of recommending systems. Costumers
can rate products and we can create recommending softwares using these ratings, which
can do recommendations and make optimized personal advertisements.
There are two important aspects of my method. The first is using asymmetric "distance"
values, so the similarity matrix of the objects can be asymmetric. The other contribution
is a modified matrix factoring method, which approximates the similarity matrix with a
smaller matrix using a gradient based search method. This contrasts the original matrix
factoring methods which use two matrices to decompose to. The result of my algorithm is a
matrix with two dimensions, one for the objects and one for the clusters. The i, j element of
the matrix is the membership of the ith element in the jth cluster. This overlapping, fuzzy
clustering is capable of identifying customer behaviour patterns. It can also be transformed
easily to discrete clusters thus solving more conventional cluster analisys tasks.