Suppose I have some data like this:
group | element1 | element2 | element3 | element4 | |
1 | 9 | 7 | 4 | 4 | |
1 | 7 | 5 | 3 | 6 | |
1 | 6 | 11 | 2 | 8 | |
2 | 5 | 5 | 7 | 6 | |
2 | 2 | 7 | 10 | 7 | |
2 | 4 | 8 | 5 | 4 | |
3 | 7 | 4 | 6 | 8 | |
3 | 8 | 6 | 8 | 6 | |
3 | 6 | 9 | 3 | 5 |
I want to draw a heatmap to see the relationship between different elements (and different groups). But different elements may have different units, such as kg, mg, m, s and so on. So I have to scale the data before drawing. Also, I want to use the average of every element data in a group, just like mean(9, 7, 6)=22/3 in group 1.
Then I'm confused whether I should [1] scale all the data before average, or [2] average before scale.
I tried these two methods, here is my code:
ori_data <- read.csv("ori_data.csv")
################ ################
### scale first ####
################ ################
# scale
m1_scaled_element <- scale(ori_data[,2:ncol(ori_data)])
m1_scaled_data <- data.frame(group = ori_data$group, element = m1_scaled_element)
# mean
m1_result <- aggregate(. ~ group, data = m1_scaled_data, mean)
m1 <- data.frame(m1_result[,2:ncol(m1_result)], row.names = m1_result$group)
colnames(m1) <- colnames(ori_data)[2:ncol(ori_data)]
################ ################
### mean then scale ####
################ ################
# mean by group
m2_mean_element <- aggregate(. ~ group, data = ori_data, mean)
# then scale
m2_scaled_mean_element <- scale(m2_mean_element[,2:ncol(m2_mean_element)])
m2 <- data.frame(element1 = m2_scaled_mean_element, row.names = m2_mean_element$group )
colnames(m2) <- colnames(ori_data)[2:ncol(ori_data)]
# draw heatmap
library(pheatmap)
pheatmap(m1, fontsize = 15)
pheatmap(m2, fontsize = 15)
The result for [1]:
group | element1 | element2 | element3 | element4 |
1 | 0.6285394 | 0.3527668 | -0.8819171 | 0.0000000 |
2 | -1.0999439 | -0.1007905 | 0.7559289 | -0.2222222 |
3 | 0.4714045 | -0.2519763 | 0.1259882 | 0.2222222 |
And the result for [2]:
group | element1 | element2 | element3 | element4 |
1 | 0.6575959 | 1.1208971 | -1.0674900 | 0 |
2 | -1.1507929 | -0.3202563 | 0.9149914 | -1 |
3 | 0.4931970 | -0.8006408 | 0.1524986 | 1 |
Also the heatmap:
It does look different.
I feel that method 1 is better but I don't know how to explain it, could anyone tell me or give some reference?
Last but not least, thank you for reading to the end.
Thank You!