Hi,
I am interested in performing correlation analysis in R by considering all variables in the column Cell.type
, currently, I have just used only one variable i.e., Whole Blood
under this (example dataset and R code given below). Is it possible to perform for loop to consider all variables, and create a correlation matrix as the final expected output? Note: We need to calculate row means
for specific cell type for instance in the example below for whole blood, similarly, we need to calculate row means
for Neutrophils
and CD8
.
dput(M8.3.sample.anno)
#> Samples Cell.type Subjects M8.3_EPSTI1 M8.3_HERC5 M8.3_HES4
#> 1 lib224 Whole Blood Type 1 Diabetes 5.058453 4.887020 1.7671376
#> 2 lib225 Whole Blood Type 1 Diabetes 4.450353 4.718768 1.2454535
#> 3 lib259 Whole Blood Sepsis 5.135682 3.956515 1.3199113
#> 4 lib265 Whole Blood Sepsis 1.880949 2.522847 0.1416930
#> 5 lib272 Whole Blood Sepsis 3.169424 1.957587 1.5035259
#> 6 lib308 Whole Blood ALS 5.247415 4.754137 2.2679888
#> 7 lib322 Whole Blood ALS 5.263406 4.661308 2.4815427
#> 8 lib328 Whole Blood Type 1 Diabetes 5.274372 5.289623 2.1011256
#> 9 lib335 Whole Blood Type 1 Diabetes 4.778972 4.913245 2.5630369
#> 10 lib355 Whole Blood ALS 4.768332 4.582032 1.5406692
#> 11 lib242 Neutrophils Type 1 Diabetes 5.253648 7.039871 1.3155759
#> 12 lib248 Neutrophils Type 1 Diabetes 3.877802 6.501353 2.2587263
#> 13 lib253 Neutrophils Sepsis 4.645075 4.384019 0.2715198
#> 14 lib260 Neutrophils Sepsis 2.322131 2.971893 0.0000000
#> 15 lib266 Neutrophils Sepsis 1.183076 2.266580 0.0000000
#> 16 lib302 Neutrophils ALS 4.421727 5.900319 2.3356959
#> 17 lib316 Neutrophils ALS 5.257992 6.495806 3.4147835
#> 18 lib323 Neutrophils Type 1 Diabetes 6.116180 7.677564 2.6296078
#> 19 lib329 Neutrophils Type 1 Diabetes 4.955377 6.370284 2.0181739
#> 20 lib349 Neutrophils ALS 5.275520 5.591712 0.9455944
#> 21 lib246 CD8 Type 1 Diabetes 3.889786 3.608076 0.9921601
#> 22 lib252 CD8 Type 1 Diabetes 3.534158 3.501803 0.9057364
#> 23 lib257 CD8 Sepsis 4.198376 3.794708 1.0898153
#> 24 lib264 CD8 Sepsis 3.187057 3.552989 0.2858425
#> 25 lib270 CD8 Sepsis 3.849569 3.864689 0.2660382
#> 26 lib306 CD8 ALS 3.854546 3.361594 1.7794049
#> 27 lib320 CD8 ALS 4.451968 3.259045 1.1280926
#> 28 lib327 CD8 Type 1 Diabetes 4.306357 3.709678 0.8845468
#> 29 lib333 CD8 Type 1 Diabetes 3.527393 2.824222 0.7876389
#> 30 lib353 CD8 ALS 3.995013 3.441016 0.9899005
## Filter by one cell type
M8.3.sample.anno.WB <- dplyr::filter(M8.3.sample.anno, Cell.type == "Whole Blood")
# M8.3.sample.anno.Neu <- dplyr::filter(M8.3.sample.anno, Cell.type == "Neutrophils")
# M8.3.sample.anno.CD8 <- dplyr::filter(M8.3.sample.anno, Cell.type == "CD8")
library(dplyr)
# Calculate row means for selected columns, specific to cell type and add as a new column named 'M8.3_Avg'
M8.3.sample.anno.WB <- M8.3.sample.anno.WB %>%
mutate(M8.3_Avg = rowMeans(select(., starts_with("M8.3_")), na.rm = TRUE))
# M8.3.sample.anno.Neu <- M8.3.sample.anno.Neu %>%
# mutate(M8.3_Avg = rowMeans(select(., starts_with("M8.3_")), na.rm = TRUE))
# M8.3.sample.anno.CD8 <- M8.3.sample.anno.CD8 %>%
# mutate(M8.3_Avg = rowMeans(select(., starts_with("M8.3_")), na.rm = TRUE))
## relocate average column
M8.3.sample.anno.WB <- relocate(M8.3.sample.anno.WB, M8.3_Avg)
# List of gene columns, replace with actual gene column names if they are different
gene_columns.M8.3 <- colnames(M8.3.sample.anno.WB)[grepl("^M8.3_", colnames(M8.3.sample.anno.WB))]
# Initialize a vector to store correlation coefficients
correlations <- c()
# Calculate correlation for each gene
for(gene in gene_columns.M8.3) {
# Skip if it's the "M8.3_Avg" column
if(gene != "M8.3_Avg") {
# Calculate Pearson correlation
correlation <- cor(M8.3.sample.anno.WB$M8.3_Avg, M8.3.sample.anno.WB[[gene]], use = "everything", method = "spearman")
# Add to the correlations vector, named by the gene
correlations[gene] <- correlation
}
}
# Create a dataframe to store the results
M8.3_cor <- data.frame(`Whole Blood` = correlations)
dput(M8.3_cor)
#> Whole.Blood
#> M8.3_EPSTI1 0.8666667
#> M8.3_HERC5 0.8060606
#> M8.3_HES4 0.8424242
Expected output:
Whole.Blood Neutrophils CD8
M8.3_EPSTI1 0.8666667 0.7575758 0.793939
M8.3_HERC5 0.8060606 0.9030303 0.212121
M8.3_HES4 0.8424242 0.8753840 0.745455
Thank you,
Toufiq