K-mean clustering

K-means clustering with 4 clusters of sizes 545, 503, 521, 30

Cluster means:
Chlorides alcohol quality pH
Cluster1 0.06080698 -0.6531504 -0.2763583 -0.7679607
Cluster2 -0.16645670 -0.3596382 -0.6644864 0.7597580
Cluster3 -0.24709688 1.0785218 0.9569538 0.1447128
Cluster4 5.97751308 -0.8348286 -0.4573673 -1.3005017

The above table gives the means (standardized variables) for each cluster. The average is equal to 0. So, negative values indicate below average values, positive values are above average.
In my opinion, this is the meaning of the numbers. I'm not sure if it's true

Google (or DuckDuckGo) is your friend. First link showing up when searching for "cluster analysis R":

https://www.statmethods.net/advstats/cluster.html

Please explain to me. Are the numbers related to the mean value?

I suggest that you read the help of the kmeans function (?kmeans) and probably the statistical reference cited in this help page.

It's hard to understand what help you need analyzing Cluster means, given the way you've posed your question. And so it seems like the best way to help is to point you to resources on the internet you are capable of searching for yourself.

Right now we don't have much to go on, could you be more specific with what you're asking?

1 Like

In winequality-red data set

df=subset(winequality.red,select = c(chlorides,alcohol,quality,pH))
I use K mean clustering for k=4
km=kmeans(df_wine_scale,centers = 4,nstart = 20)

K-means clustering with 4 clusters of sizes 545, 503, 521, 30

Cluster means:
chlorides alcohol quality pH
Cluster1 0.06080698 -0.6531504 -0.2763583 -0.7679607
Cluster2 -0.16645670 -0.3596382 -0.6644864 0.7597580
Cluster3 -0.24709688 1.0785218 0.9569538 0.1447128
Cluster4 5.97751308 -0.8348286 -0.4573673 -1.3005017

Exactly I want to know how these numbers correspond to 4 variables and 4 clusters in cluster means to make conclusions for characteristics? Thanks very much. Help me please.

This is not an answer, but a related question. Why do you consider the quality variable during clustering?

The output variable of the wine-quality dataset is quality, and usually people skip that column to make the dataset unsupervised and perform clustering then.

To interpret the results of kmeans, this will be useful.

Please be sure to take a look at our site's homework policy:

3 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.