Hierarchical cluster analysis help - dendrogram

I made a code to generate a dendrogram as you can see in the image, using the hclust function. So, two things: first I would like to know if the code the way I did it is correct, second is that I would like help in the interpretation of this dendrogram. Note that the locations of these points are close. What does this dendrogram result I'm having mean, can you help me? For example, property 30 is isolated from the others, why? That kind of interpretation I would like.

Points_properties<-structure(list(Latitude = c(-24.781624, -24.775017, -24.769196, 
                                               -24.761741, -24.752019, -24.748008, -24.737312, -24.744718, -24.751996, 
                                               -24.724589, -24.8004, -24.796899, -24.795041, -24.780501, -24.763376, 
                                               -24.801715, -24.728005, -24.737845, -24.743485, -24.742601, -24.766422, 
                                               -24.767525, -24.775631, -24.792703, -24.790994, -24.787275, -24.795902, 
                                               -24.785587, -24.787558, -24.799524), Longitude = c(-49.937369, 
                                                                                                  -49.950576, -49.927608, -49.92762, -49.920608, -49.927707, -49.922095, 
                                                                                                  -49.915438, -49.910843, -49.899478, -49.901775, -49.89364, -49.925657, 
                                                                                                  -49.893193, -49.94081, -49.911967, -49.893358, -49.903904, -49.906435, 
                                                                                                  -49.927951, -49.939603, -49.941541, -49.94455, -49.929797, -49.92141, 
                                                                                                  -49.915141, -49.91042, -49.904772, -49.894034, -49.86651), cluster = c("1", "1", 
                                                                                                                                                                         "1", "1", "2", "2", "2", "2", "2", "2", "1", "1", "1", "1", "1", 
                                                                                                                                                                         "1", "2", "2", "2", "2", "1", "1", "1", "1", "1", "1", "1", "1", 
                                                                                                                                                                         "1", "1")), row.names = c(NA, -30L), class = c("tbl_df", "tbl", 
                                                                                                                                                                                                                        "data.frame"))

coordinates<-subset(Points_properties,select=c("Latitude","Longitude"))
plot (coordinates)

enter image description here

d<-distm(coordinates[,2:1])
d<-as.dist(d)
fit.average<-hclust(d,method="average")
plot(fit.average,hang=-1,cex=.8, main = "")

enter image description here

At the base of the dendrogram, each observation forms an individual termination known as the leaf of the tree. As one ascends the structure, pairs of leaves merge to form the first branches. These unions (nodes) correspond to the most similar pairs of observations.

It also happens that branches merge with other branches or with leaves. The earlier (closer to the base of the dendrogram) a merger occurs, the greater the similarity (like 21 - 22).

This means that, for any pair of observations, the point in the tree where the branches containing those observations merge can be identified. The height at which this occurs (vertical axis) indicates how similar/different the two observations are.

The data in point 30 is different of all data and h_clust form a different group. Check the distance between themselves.

  • Dendrograms, therefore, should be interpreted solely based on the vertical axis and not by the positions occupied by the observations on the horizontal axis.

image

  1. With level of 6500 Height you have 2 groups, 30 alone a other all observations in group. (arrow blue)
  2. With level of 5000 Height, you have 2 groups. But the 30 group is left out. (arrow red)
fit.average<-hclust(d,method="average")
# Exist other many methods, like `ward`, but depend of data.
2 Likes

Excellent explanation, thanks you so much @M_AcostaCH!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.