Calculate differences between multiple pairs of variables

jarle · December 20, 2023, 9:53am

I have pairs of variables with the following naming convention:
variablename1, variablename1_t2,
variablename2, variablename2_t2,
variablenameX, variablenameX_t2, and so on.

For exploration I would like to calculate the difference between each of the variable pairs (variablenameX_t2 - variablenameX) and use the crosstable by function with automatic effect sizes and test statistics.

Any ideas? E.g. how to run through /map the variables, e.g. mutate into new variables (variablenameX_diff) ?

Or can it be done "on the fly" with crosstable?

crosstable(df, c(ends_with("_diff")), 
           by=c(comparison_variable_of_interest), 
           unique_numeric = 5,
           percent_digits=2, 
           percent_pattern="{n} ({p_col})",
           total="both",
           showNA="ifany",
           effect = TRUE, test = TRUE) %>%
  as_flextable(keep_id=TRUE, compact=TRUE, header_show_n=1)

nirgrahamuk · December 20, 2023, 10:43am

there is a default approach to follow in all such cases.
a) write a funtion that can handle one of your cases
b) use it in a map to solve all your cases

Here is a worked example.
In general these functions where one processes variables and composes more from them, leads to a fair amount of metaprogramming, tidyeval (i.e. rlang concerns), hence we will be having to use walrus operator := and !!sym() calls, and while rlang uses glue implicitly for left hand side of dynamic assignments, I find loading glue, to be a great help in construction the right hand side of the assignments.

library(tidyverse)
library(glue)

# what variable prefixes to run through
my_pair_var_src_vec <- c("Petal", "Sepal")

# code that does one 
transmute(
  iris,
  Petal_diff := Petal.Length - Petal.Width)


# a function that achieves the same
do_a_pair <- function(data, src) {
  transmute(
    data,
    "{src}_diff" := !!sym(glue("{src}.Length")) - !!sym(glue("{src}.Width"))
  )
}

# test the function
do_a_pair(iris, src = my_pair_var_src_vec[1])

# finally use it fully 
bind_cols(
  iris,
  map_dfc(my_pair_var_src_vec, \(x){
    do_a_pair(iris, x)
  })
)

jarle · December 21, 2023, 9:57am

Thanks!
The following works, but I'd like to select variables based on a naming convention / pattern, e.g. by wildcard or phrase at beginning or end of variable names.

I'd also like to specify more parameters in the vector, i.e. also pre-/postfixes
(I call ".Width" and ".Length" postfixes and "Petal" and "Sepal" variable names.)

Suggestions?

library(tidyverse)
library(glue)

# code that does one 
transmute(
  iris,
  Sepal_diff := Sepal.Width - Sepal.Length)

# what variables to run through. Specify pattern and postfix also?
my_var_parameters_src_vec <- c("Petal", "Sepal")

# a function that achieves the same
do_variable <- function(data, src) {
  transmute(
    data,
    "{src}_diff" := !!sym(glue("{src}.Width")) - !!sym(glue("{src}.Length"))
  )
}

# test the function
do_variable(iris, src = my_var_parameters_src_vec[1])

# finally use it fully 
bind_cols(
  iris,
  map_dfc(my_var_parameters_src_vec, \(x){
    do_variable(iris, x)
  })
)

nirgrahamuk · December 21, 2023, 11:21am

library(tidyverse)
library(glue)

(todo_df <- data.frame(src=c("Petal","Sepal"),
           left_suffix=c(".Width",".Length"),
           right_suffix=c(".Length",".Width"),
           result_extend=c("_diff_wl","_diff_lw")))

# a function that achieves the same
do_variable <- function(data, src,left_s,right_s,re) {
  transmute(
    data,
    "{src}{re}" := !!sym(glue("{src}{left_s}")) - !!sym(glue("{src}{right_s}"))
  )
}

# test the function
do_variable(iris, src = todo_df[1,][[1]],
             left_s=todo_df[1,][[2]],
             right_s=todo_df[1,][[3]],
             re = todo_df[1,][[4]])

# finally use it fully 
bind_cols(
  iris,
  map_dfc(seq_len(nrow(todo_df)), \(i){
    do_variable(iris, 
                 src = todo_df[i,][[1]],
                left_s=todo_df[i,][[2]],
                right_s=todo_df[i,][[3]],
                re = todo_df[i,][[4]])
  })
)

jarle · December 21, 2023, 12:04pm

Thanks!
Is it also possible to use wildcard or position in the todo_df, e.g.

src=c("*al")

Or perhaps achieve this in another way. What I am after is to run the function on all variables containing the phrase "al" (or in a range of positions) instead of specifying their full names.

nirgrahamuk · December 21, 2023, 2:35pm

I think I'd have to make too many assumptions to attempt to address this further.
i.e. theres no clarity on how searching "al" , to in iris case recovering the two petal variarions and hte two sepal variations, would be used to determine what part is the left and what is the right , etc etc.

jarle · December 21, 2023, 2:54pm

OK, thanks for answering

EnochMitchell · January 15, 2024, 6:13pm

I'm grappling with a data exploration challenge involving pairs of variables with a specific naming convention: variablename1, variablename1_t2, variablename2, variablename2_t2, and so on. To enhance my analysis, I want to calculate the differences between each variable pair and subsequently perform crosstabulations using the crosstab_by function with automatic effect sizes and test statistics.

system · February 26, 2024, 6:14pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.