# Scale or average first

Suppose I have some data like this:

group element1 element2 element3 element4
1 9 7 4 4
1 7 5 3 6
1 6 11 2 8
2 5 5 7 6
2 2 7 10 7
2 4 8 5 4
3 7 4 6 8
3 8 6 8 6
3 6 9 3 5

I want to draw a heatmap to see the relationship between different elements (and different groups). But different elements may have different units, such as kg, mg, m, s and so on. So I have to scale the data before drawing. Also, I want to use the average of every element data in a group, just like mean(9, 7, 6)=22/3 in group 1.

Then I'm confused whether I should [1] scale all the data before average, or [2] average before scale.

I tried these two methods, here is my code:

``````ori_data <- read.csv("ori_data.csv")
################                             ################
###                      scale first                     ####
################                             ################

# scale
m1_scaled_element <- scale(ori_data[,2:ncol(ori_data)])
m1_scaled_data <- data.frame(group = ori_data\$group, element = m1_scaled_element)

# mean
m1_result <- aggregate(. ~ group, data = m1_scaled_data, mean)

m1 <- data.frame(m1_result[,2:ncol(m1_result)], row.names = m1_result\$group)
colnames(m1) <- colnames(ori_data)[2:ncol(ori_data)]

################                             ################
###                    mean then scale                   ####
################                             ################

# mean by group
m2_mean_element <- aggregate(. ~ group, data = ori_data, mean)
# then scale
m2_scaled_mean_element <- scale(m2_mean_element[,2:ncol(m2_mean_element)])
m2 <- data.frame(element1 = m2_scaled_mean_element, row.names = m2_mean_element\$group )
colnames(m2) <- colnames(ori_data)[2:ncol(ori_data)]
# draw heatmap
library(pheatmap)
pheatmap(m1, fontsize = 15)
pheatmap(m2, fontsize = 15)
``````

The result for [1]:

group element1 element2 element3 element4
1 0.6285394 0.3527668 -0.8819171 0.0000000
2 -1.0999439 -0.1007905 0.7559289 -0.2222222
3 0.4714045 -0.2519763 0.1259882 0.2222222

And the result for [2]:

group element1 element2 element3 element4
1 0.6575959 1.1208971 -1.0674900 0
2 -1.1507929 -0.3202563 0.9149914 -1
3 0.4931970 -0.8006408 0.1524986 1

Also the heatmap:

It does look different.
I feel that method 1 is better but I don't know how to explain it, could anyone tell me or give some reference?
Last but not least, thank you for reading to the end.

Thank You!

I would scale then average. The reason being that first you put all your variables on "equal footing" and then you average over the groups - seems more logical to me.

Thank you. I thought if average before scale, some information may be lost, such as the deviation of elements in a group. So scale then average is better, but I don't know how to test the result of which is better.