What does the dendogram on a heat map do - what value or information does it add? I read somewhere that it helps you check the logical relationship between variables but how? If anyone knows, has resources on this, or could explain this that would be great.
To see the sort of thing I’m referring to, see this link and the branches on the outskirts of the matrix:
Most heatmaps have been clustered in order to show the most related data together. There are many different clustering algorithms, but if you use the heatmap() function in R, the default is hclust, a hierarchical clustering. "hierarchical " here is key as it is the basis for the dendrograms.
Dendrograms show how related data point are. Here is a short blog that introduces the topic:
Details on the heatmap function are found in the documentation
Note that it's the exact same data, these two heatmaps are identical in terms of what they show, but the first one looks nicer, and makes it obvious that there are 2 groups of metrics: the car weight/engine size/horsepower are correlated together, and anticorrelated with the miles per gallon etc. On the second heatmap it's not as obvious.
For the first argument, the important part is the column ordering, we wouldn't have something that looks much worse if we didn't represent the dendrogram (although I feel the presence of the dendrogram does signal to the reader that a hierarchical clustering was used, and the nice order is not a coincidence).
For the second part however, the dendrogram is important: sure, it looks like there are two groups, how convincing is it? The distances in the dendrogram help us see how well the groups are separated, and can become even more important when we have a more complex structure.