Scale or average first

Yorks0n · October 25, 2019, 8:51am

Suppose I have some data like this:


group	element1	element2	element3	element4
1	9	7	4	4
1	7	5	3	6
1	6	11	2	8
2	5	5	7	6
2	2	7	10	7
2	4	8	5	4
3	7	4	6	8
3	8	6	8	6
3	6	9	3	5

I want to draw a heatmap to see the relationship between different elements (and different groups). But different elements may have different units, such as kg, mg, m, s and so on. So I have to scale the data before drawing. Also, I want to use the average of every element data in a group, just like mean(9, 7, 6)=22/3 in group 1.

Then I'm confused whether I should [1] scale all the data before average, or [2] average before scale.

I tried these two methods, here is my code:

ori_data <- read.csv("ori_data.csv")
################                             ################
###                      scale first                     ####
################                             ################

# scale
m1_scaled_element <- scale(ori_data[,2:ncol(ori_data)])
m1_scaled_data <- data.frame(group = ori_data$group, element = m1_scaled_element)

# mean
m1_result <- aggregate(. ~ group, data = m1_scaled_data, mean)

m1 <- data.frame(m1_result[,2:ncol(m1_result)], row.names = m1_result$group)
colnames(m1) <- colnames(ori_data)[2:ncol(ori_data)]

################                             ################
###                    mean then scale                   ####
################                             ################

# mean by group
m2_mean_element <- aggregate(. ~ group, data = ori_data, mean)
# then scale
m2_scaled_mean_element <- scale(m2_mean_element[,2:ncol(m2_mean_element)])
m2 <- data.frame(element1 = m2_scaled_mean_element, row.names = m2_mean_element$group )
colnames(m2) <- colnames(ori_data)[2:ncol(ori_data)]
# draw heatmap
library(pheatmap)
pheatmap(m1, fontsize = 15)
pheatmap(m2, fontsize = 15)

The result for [1]:


group	element1	element2	element3	element4
1	0.6285394	0.3527668	-0.8819171	0.0000000
2	-1.0999439	-0.1007905	0.7559289	-0.2222222
3	0.4714045	-0.2519763	0.1259882	0.2222222

And the result for [2]:


group	element1	element2	element3	element4
1	0.6575959	1.1208971	-1.0674900	0
2	-1.1507929	-0.3202563	0.9149914	-1
3	0.4931970	-0.8006408	0.1524986	1

Also the heatmap:

It does look different.
I feel that method 1 is better but I don't know how to explain it, could anyone tell me or give some reference?
Last but not least, thank you for reading to the end.

Thank You!

valeri · October 25, 2019, 10:02am

I would scale then average. The reason being that first you put all your variables on "equal footing" and then you average over the groups - seems more logical to me.

Yorks0n · October 26, 2019, 7:12am

Thank you. I thought if average before scale, some information may be lost, such as the deviation of elements in a group. So scale then average is better, but I don't know how to test the result of which is better.
Anyway, thank u for replying.

system · November 16, 2019, 7:12am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.