Hello, I'm analyzing data for an experiment that I conducted, and need some help with some code to conduct multiple t-tests. In short, I have code that works, but I enjoy the tidyverse approach to programming, so I'm wondering how I can do the same thing more elegantly using the purrr package.
Basically, my participants solved three math problems under two conditions ("norm" and "sf") then answered a difficulty questionnaire with five items after each one: ("mental", "temporal", "effort", "performance", and "frustration"). I'm interested in conducting a t-test for each of the five items, for each of the 3 questions, so 15 tests in total. Here is my working code:
# Load tidyverse
library(tidyverse)
library(broom)
set.seed(123)
# Create test tibble
my_df <- tibble(
participant = rep(1:10, each = 6),
question = factor(rep(1:3, times = 20)),
condition = factor(rep(c("norm", "sf"), each = 3, times = 10)),
mental = sample(1:100, 60, replace = TRUE),
temporal = sample(1:100, 60, replace = TRUE),
effort = sample(1:100, 60, replace = TRUE),
performance = sample(1:100, 60, replace = TRUE),
frustration = sample(1:100, 60, replace = TRUE),
)
# My multiple t-test function
multiple_t <- function(df, vars) {
result_list <- vector("list", length(vars) * 3)
for (i in vars) {
for (j in 1:3) {
index <- 3 * (which(vars == i) - 1) + j
names(result_list)[[index]] <- paste(i, "_", j, sep = "")
result_list[[index]] <- df |>
filter(question == j) |>
t.test(as.formula(paste(i, " ~ condition")), data = _,
paired = TRUE) |>
tidy()
}
}
result_df <- list_rbind(result_list, names_to = "test")
return(result_df)
}
# Testing the function on the data frame
multiple_t(my_df, c("mental", "temporal", "effort", "performance", "frustration"))
As I said, this code produces the desired result, but I'm wondering if there's a more concise, elegant way to do the same thing (in a tidyverse paradigm, ideally). I thought perhaps map() is the way to do it, but even though I've gone through a few tutorials on it (including Jenny Bryan's) I don't feel like I understand it very well. Here's my attempt using purrr()
my_df |>
select(mental:frustration) |>
map(\(x) tidy(t.test(x ~ my_df$condition, paired = TRUE))) |>
list_rbind(names_to = "id")
The problem is this code uses all the data from the selected columns for the t-test, rather than disaggregating by question number. Does anyone have suggestions? Also, can anyone recommend a learning resource that covers this type of thing (I've already taken several data camp courses and read most of R4DS). Thank you.