Well, the exact computations depend a lot on what you need in the result. Taking your initial post, I assume you want the group means and the ANOVA p-value (not the post-hoc hsd).
In base R, you can get the group means for one comparison with:
aggregate(Metabolite1 ~ sleepFactor, data = data_df, FUN = mean)
#> sleepFactor Metabolite1
#> 1 A 2.5
#> 2 B 4.0
#> 3 C 3.5
And you can extract the means by taking the second column:
aggregate(Metabolite1 ~ sleepFactor, data = data_df, FUN = mean)[,2]
#> [1] 2.5 4.0 3.5
So you could create such a loop:
# initialize the results data.frame
nb_sleepFactor <- 2
nb_tests <- (ncol(data_df) - nb_sleepFactor)
Res.table <- data.frame(Group1_mean = double(nb_tests),
Group2_mean = double(nb_tests),
Group3_mean = double(nb_tests),
PValue = double(nb_tests))
# run the loop
for(i in 1:nrow(Res.table)){
formula <- as.formula(paste0(colnames(data_df)[nb_sleepFactor+i], "~ sleepFactor"))
message("i=",i,", testing column ", colnames(data_df)[nb_sleepFactor+i])
Res.table[i,1:3] <- aggregate(formula,
data = data_df,
FUN = mean)[,2]
model <- aov(formula, data = data_df)
Res.table[i,4] <- summary(model)[[1]]$`Pr(>F)`[1]
}
# visualize the results
Res.table
#> Group1_mean Group2_mean Group3_mean PValue
#> 1 2.5 4.0 3.5 0.7806568
#> 2 2.5 2.5 4.5 0.3617436
#> 3 2.5 5.0 4.5 0.3535534
#> 4 2.5 4.5 5.0 0.3535534
#> 5 2.5 3.0 3.5 0.8184876
Technically this approach is not "the best", but it is quite extensible: if you can figure out how to compute a value, you can just add it to your results data.frame.
Another way to aproach the problem is using the tidyverse and a more functional approach.
library(tidyverse)
Let's start by making the metabolite a single column.
data_df |>
pivot_longer(cols = -starts_with("sleepFactor"),
names_to = "metabolite",
values_to = "val_metabolite")
#> # A tibble: 30 × 4
#> sleepFactor sleepFactor2 metabolite val_metabolite
#> <chr> <chr> <chr> <dbl>
#> 1 A A Metabolite1 1
#> 2 A A Metabolite2 1
#> 3 A A Metabolite3 1
#> 4 A A Metabolite4 1
#> 5 A A Metabolite5 1
#> 6 B C Metabolite1 2
#> 7 B C Metabolite2 3
#> 8 B C Metabolite3 4
#> 9 B C Metabolite4 5
#> 10 B C Metabolite5 2
#> # ℹ 20 more rows
#> # ℹ Use `print(n = ...)` to see more rows
We could also do that with the sleepFactors if it makes sense for subsequent analysis:
data_df |>
pivot_longer(cols = -starts_with("sleepFactor"),
names_to = "metabolite",
values_to = "val_metabolite") |>
pivot_longer(cols = starts_with("sleepFactor"),
names_to = "sleep_factor_type",
values_to = "val_sleep_factor")
At this point, we can already group by both sleepFactor
and metabolite
, to get the mean of each metabolite in each sleepFactor
group:
data_df |>
pivot_longer(cols = -starts_with("sleepFactor"),
names_to = "metabolite",
values_to = "val_metabolite") |>
group_by(sleepFactor, metabolite) |>
summarize(mean = mean(val_metabolite),
.groups = "drop")
#> # A tibble: 15 × 3
#> sleepFactor metabolite mean
#> <chr> <chr> <dbl>
#> 1 A Metabolite1 2.5
#> 2 A Metabolite2 2.5
#> 3 A Metabolite3 2.5
#> 4 A Metabolite4 2.5
#> 5 A Metabolite5 2.5
#> 6 B Metabolite1 4
#> 7 B Metabolite2 2.5
#> 8 B Metabolite3 5
#> 9 B Metabolite4 4.5
#> 10 B Metabolite5 3
#> 11 C Metabolite1 3.5
#> 12 C Metabolite2 4.5
#> 13 C Metabolite3 4.5
#> 14 C Metabolite4 5
#> 15 C Metabolite5 3.5
This has advantages, you can use split()
to make a list of data.frames, then use map()
to call a function on each data.frame. And as a bonus, if your function returns a data.frame (so you have a new list of data.frames), you can assemble them in a single data.frame with dplyr::bind_rows()
.
data_long <- data_df |>
pivot_longer(cols = -starts_with("sleepFactor"),
names_to = "metabolite",
values_to = "val_metabolite")
single_model <- function(df){
model <- aov(val_metabolite ~ sleepFactor, data = df)
broom::tidy(model)
}
split(data_long, f = data_long$metabolite) |>
map(single_model) |>
bind_rows()
#> # A tibble: 10 × 6
#> term df sumsq meansq statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 sleepFactor 2 2.33 1.17 0.269 0.781
#> 2 Residuals 3 13 4.33 NA NA
#> 3 sleepFactor 2 5.33 2.67 1.45 0.362
#> 4 Residuals 3 5.50 1.83 NA NA
#> 5 sleepFactor 2 7.00 3.50 1.50 0.354
#> 6 Residuals 3 7.00 2.33 NA NA
#> 7 sleepFactor 2 7 3.5 1.5 0.354
#> 8 Residuals 3 7 2.33 NA NA
#> 9 sleepFactor 2 1.00 0.500 0.214 0.818
#> 10 Residuals 3 7.00 2.33 NA NA
Created on 2023-08-15 with reprex v2.0.2
This approach may seem confusing at first, and it's totally fine if you prefer to stick with the for
loop, but it's also very powerful: if you do a lot of statistics like this it will simplify your life.