How to use quosures within t_test from tidymodels?

Jen_Ren · June 5, 2024, 1:07pm

Hi, first question here so please let me know if I am missing anything.

I want to create a function that passes in parameters to tidymodels' t_test function. I believe quosures need to be used, but I'm relatively new to using them. In general, I haven't had an issue with other tidyverse functions, but I get an error when I try to pass !!var into t_test.

Am I misunderstanding how quosures should be used? I see there is a newer convention of using double curly braces ({{ }}), but it yields the same error.

Below is my reprex:

perform_t_test <- function(data, group_col, response_col) {
  group_col <- enquo(group_col)
  response_col <- enquo(response_col)
  
  data %>%
    t_test(formula = !!response_col ~ !!group_col)
}

perform_t_test(tbl, y, x)

The error I get is:

Error in t_test():
! The response variable !!response_col cannot be found in this dataframe.
Backtrace:

global perform_t_test(tbl, y, x)

infer::t_test(., formula = !!response_col ~ !!group_col)

mduvekot · June 5, 2024, 1:51pm

(revised for clarity)


infer::gss |>
  tidyr::drop_na(college) |>
  infer::t_test(formula = hours ~ college,
         order = c("degree", "no degree"),
         alternative = "two-sided")


perform_t_test <- function(data, response_var, explanatory_var,  order = NULL, alternative = "two-sided")  {
    infer::t_test(data, formula = rlang::expr(!!rlang::sym(response_var) ~ !!rlang::sym(explanatory_var)),  order = order, alternative)
}


infer::gss |> 
  tidyr::drop_na(college) |> 
  perform_t_test("hours", "college", order = c("degree", "no degree"))

dromano · June 5, 2024, 2:32pm

Hi @Jen_Ren ,

This isn't quite a reprex, so I thought I'd describe some steps you could take to make it a reprex:

Reload RStudio*, and without doing anything else, follow the next steps below.
Immediately open a new source file.
In that file, write all the code you need to reproduce the error, including library() calls and the creation of data (like the tbl object you used).
Use select-all to select the complete contents of the file.
Select 'Reprex selection' from the 'Addins' menu, which places the reprex output on your clipboard
Immediately use Ctrl-V here to paste that output into a code block.

*This is not necessary once you're familiar with the process, until which, it's easy to make mistakes otherwise.

Jen_Ren · June 5, 2024, 2:45pm

Hi David, thanks for the detailed walkthrough! Here is another attempt using the default gss dataset as an example. Essentially I would like to create a function that wraps around infer::t_test where I can bind multiple outputs row-wise, but I'm starting with the simplest case of a single call to infer::t_test.

I have also tried mduvekot's suggestion to no avail -- I'm happy to post another reprex if helpful but below is the reprex using my original example:

library(infer)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

infer::t_test(gss, hours ~ college)
#> Warning: The statistic is based on a difference or ratio; by default, for
#> difference-based statistics, the explanatory variable is subtracted in the
#> order "no degree" - "degree", or divided in the order "no degree" / "degree"
#> for ratio-based statistics. To specify this order yourself, supply `order =
#> c("no degree", "degree")`.
#> # A tibble: 1 × 7
#>   statistic  t_df p_value alternative estimate lower_ci upper_ci
#>       <dbl> <dbl>   <dbl> <chr>          <dbl>    <dbl>    <dbl>
#> 1     -1.12  366.   0.264 two.sided      -1.54    -4.24     1.16

perform_t_test <- function(group_col, response_col) {
  infer::t_test(gss, !!response_col ~ !!group_col)
}

perform_t_test(college, hours)
#> Error in `infer::t_test()`:
#> ! The response variable `! and !response_col` cannot be found in this
#>   dataframe.

^{Created on 2024-06-05 with reprex v2.1.0}

dromano · June 5, 2024, 2:51pm

Perfect — thank you, @Jen_Ren

joels · June 5, 2024, 3:14pm

One approach would be to convert each argument to a character value and then construct the formula to feed to t_test:

library(infer)
library(dplyr)
library(tidyr)

perform_t_test <- function(group_col, response_col) {
  group_col = as_label(enquo(group_col))
  response_col = as_label(enquo(response_col))
  form = as.formula(paste(response_col, "~", group_col))
  
  infer::t_test(gss, formula=form)
}

perform_t_test(college, hours)
#> Warning: The statistic is based on a difference or ratio; by default, for
#> difference-based statistics, the explanatory variable is subtracted in the
#> order "no degree" - "degree", or divided in the order "no degree" / "degree"
#> for ratio-based statistics. To specify this order yourself, supply `order =
#> c("no degree", "degree")`.
#> # A tibble: 1 × 7
#>   statistic  t_df p_value alternative estimate lower_ci upper_ci
#>       <dbl> <dbl>   <dbl> <chr>          <dbl>    <dbl>    <dbl>
#> 1     -1.12  366.   0.264 two.sided      -1.54    -4.24     1.16

Jen_Ren · June 5, 2024, 3:47pm

Thank you so much! Very helpful workaround. This worked for me.

dromano · June 6, 2024, 3:17am

And here's an approach that uses the !! operator:

library(tidyverse)
library(infer)
library(rlang)

perform_t_test <- function(data, group_col, response_col) {
  d <- enexpr(data)
  g <- enexpr(group_col)
  r <- enexpr(response_col)
  t <-
    expr(
      t_test(!!d, !!r ~ !!g)
    )
  eval_tidy(t, data)
}

perform_t_test(gss, college, hours)
#> # A tibble: 1 × 7
#>   statistic  t_df p_value alternative estimate lower_ci upper_ci
#>       <dbl> <dbl>   <dbl> <chr>          <dbl>    <dbl>    <dbl>
#> 1     -1.12  366.   0.264 two.sided      -1.54    -4.24     1.16

^{Created on 2024-06-05 with reprex v2.0.2}

system · June 13, 2024, 3:17am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.