methods for clustering categorical data

Hi,

One way of opening the data up for all different types of clustering is by converting the categorical variable into a one-hot vector representation, where you add columns to your data, one for each option in each category:

Although it can greatly expand the input space of the data, then you can use almost any type of clustering method.

THere are many clustering algorithms but one of the most popular methods is k-means clustering for which there are R packages.

Another popular method is hierarchical clustering, were each point is shown in a hierarchy, where you can see how closely it is related to any other point.

Check out this website:

Good luck
PJ