Iteratively Conducting T-tests in the Tidyverse

ryan.britt · August 7, 2023, 10:02pm

Hello, I'm analyzing data for an experiment that I conducted, and need some help with some code to conduct multiple t-tests. In short, I have code that works, but I enjoy the tidyverse approach to programming, so I'm wondering how I can do the same thing more elegantly using the purrr package.

Basically, my participants solved three math problems under two conditions ("norm" and "sf") then answered a difficulty questionnaire with five items after each one: ("mental", "temporal", "effort", "performance", and "frustration"). I'm interested in conducting a t-test for each of the five items, for each of the 3 questions, so 15 tests in total. Here is my working code:

# Load tidyverse
library(tidyverse)
library(broom)

set.seed(123)

# Create test tibble
my_df <- tibble(
  participant = rep(1:10, each = 6),
  question = factor(rep(1:3, times = 20)),
  condition = factor(rep(c("norm", "sf"), each = 3, times = 10)),
  mental = sample(1:100, 60, replace = TRUE),
  temporal = sample(1:100, 60, replace = TRUE),
  effort = sample(1:100, 60, replace = TRUE),
  performance = sample(1:100, 60, replace = TRUE),
  frustration = sample(1:100, 60, replace = TRUE),
)

# My multiple t-test function
multiple_t <- function(df, vars) {
  result_list <- vector("list", length(vars) * 3)
  for (i in vars) {
    for (j in 1:3) {
      index <- 3 * (which(vars == i) - 1) + j
      names(result_list)[[index]] <- paste(i, "_", j, sep = "")
      result_list[[index]] <- df |> 
        filter(question == j) |> 
        t.test(as.formula(paste(i, " ~ condition")), data = _,
               paired = TRUE) |> 
        tidy()
    }
  }
  result_df <- list_rbind(result_list, names_to = "test")
  return(result_df)
}

# Testing the function on the data frame
multiple_t(my_df, c("mental", "temporal", "effort", "performance", "frustration"))

As I said, this code produces the desired result, but I'm wondering if there's a more concise, elegant way to do the same thing (in a tidyverse paradigm, ideally). I thought perhaps map() is the way to do it, but even though I've gone through a few tutorials on it (including Jenny Bryan's) I don't feel like I understand it very well. Here's my attempt using purrr()

my_df |> 
  select(mental:frustration) |> 
  map(\(x) tidy(t.test(x ~ my_df$condition, paired = TRUE))) |> 
  list_rbind(names_to = "id")

The problem is this code uses all the data from the selected columns for the t-test, rather than disaggregating by question number. Does anyone have suggestions? Also, can anyone recommend a learning resource that covers this type of thing (I've already taken several data camp courses and read most of R4DS). Thank you.

gueyenono · August 8, 2023, 2:11am

Hi @ryan.britt and welcome to Posit Community

If you are new to the tidyverse, what I would say is keep at it until it makes more and more sense to you. It is indeed a very powerful approach when wrangling data.

In order to achieve your desired result, you need to nest your data frame (tibble) first before estimating your test statistic with purrr::map(). Here is the code and the final output (don't hesitate to let me know if you have questions).

set.seed(123)

# Create test tibble
my_df <- tibble::tibble(
  participant = rep(1:10, each = 6),
  question = factor(rep(1:3, times = 20)),
  condition = factor(rep(c("norm", "sf"), each = 3, times = 10)),
  mental = sample(1:100, 60, replace = TRUE),
  temporal = sample(1:100, 60, replace = TRUE),
  effort = sample(1:100, 60, replace = TRUE),
  performance = sample(1:100, 60, replace = TRUE),
  frustration = sample(1:100, 60, replace = TRUE),
)

nested_df <- my_df |>
  tidyr::pivot_longer(cols = c(mental, temporal, effort, performance, frustration), names_to = "item", values_to = "score") |>
  dplyr::group_nest(question, item) |>
  dplyr::mutate(
    t_test = purrr::map(.x = data, .f = \(x){
      t.test(x$score ~ x$condition, paired = TRUE) |>
        broom::tidy()
    }) 
  ) |>
  dplyr::select(-data) |>
  tidyr::unnest(cols = t_test)

nested_df

# A tibble: 15 × 10
   question item        estimate statistic p.value parameter conf.low conf.high method        alternative
   <fct>    <chr>          <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <chr>         <chr>      
 1 1        effort          -4.2   -0.443   0.668          9   -25.7     17.3   Paired t-test two.sided  
 2 1        frustration     -8.6   -0.458   0.658          9   -51.1     33.9   Paired t-test two.sided  
 3 1        mental           9.2    0.662   0.524          9   -22.2     40.6   Paired t-test two.sided  
 4 1        performance      4.8    0.393   0.704          9   -22.9     32.5   Paired t-test two.sided  
 5 1        temporal         0.3    0.0232  0.982          9   -28.9     29.5   Paired t-test two.sided  
 6 2        effort         -10     -1.06    0.319          9   -31.4     11.4   Paired t-test two.sided  
 7 2        frustration     -2.1   -0.193   0.851          9   -26.7     22.5   Paired t-test two.sided  
 8 2        mental          27.2    2.08    0.0668         9    -2.33    56.7   Paired t-test two.sided  
 9 2        performance     -6.6   -0.472   0.648          9   -38.2     25.0   Paired t-test two.sided  
10 2        temporal       -10     -0.578   0.578          9   -49.2     29.2   Paired t-test two.sided  
11 3        effort          -9.7   -1.08    0.307          9   -30.0     10.6   Paired t-test two.sided  
12 3        frustration     15.8    1.20    0.260          9   -13.9     45.5   Paired t-test two.sided  
13 3        mental         -26     -2.31    0.0459         9   -51.4     -0.592 Paired t-test two.sided  
14 3        performance     -9.7   -0.652   0.531          9   -43.3     23.9   Paired t-test two.sided  
15 3        temporal        12.7    0.993   0.347          9   -16.2     41.6   Paired t-test two.sided

This old video from Hadley Wickham will help you understand the power of nested data frames and list-columns: Hadley Wickham: Managing many models with R - YouTube

ryan.britt · August 9, 2023, 6:29pm

Thank you sooo much! I had heard of nesting but never really understood how list-columns are integral to the purrr approach. I noticed in that video that the same approach can be used to generate multiple models and plots at once, which is another thing I'll need to do in the future. I will definitely spend more time looking in to that topic. I appreciate your help!

gueyenono · August 9, 2023, 7:01pm

You're very welcome. I'm glad I could help.

system · August 16, 2023, 7:01pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.