I have pairs of variables with the following naming convention:
variablename1, variablename1_t2,
variablename2, variablename2_t2,
variablenameX, variablenameX_t2, and so on.
For exploration I would like to calculate the difference between each of the variable pairs (variablenameX_t2 - variablenameX) and use the crosstable by function with automatic effect sizes and test statistics.
Any ideas? E.g. how to run through /map the variables, e.g. mutate into new variables (variablenameX_diff) ?
there is a default approach to follow in all such cases.
a) write a funtion that can handle one of your cases
b) use it in a map to solve all your cases
Here is a worked example.
In general these functions where one processes variables and composes more from them, leads to a fair amount of metaprogramming, tidyeval (i.e. rlang concerns), hence we will be having to use walrus operator := and !!sym() calls, and while rlang uses glue implicitly for left hand side of dynamic assignments, I find loading glue, to be a great help in construction the right hand side of the assignments.
library(tidyverse)
library(glue)
# what variable prefixes to run through
my_pair_var_src_vec <- c("Petal", "Sepal")
# code that does one
transmute(
iris,
Petal_diff := Petal.Length - Petal.Width)
# a function that achieves the same
do_a_pair <- function(data, src) {
transmute(
data,
"{src}_diff" := !!sym(glue("{src}.Length")) - !!sym(glue("{src}.Width"))
)
}
# test the function
do_a_pair(iris, src = my_pair_var_src_vec[1])
# finally use it fully
bind_cols(
iris,
map_dfc(my_pair_var_src_vec, \(x){
do_a_pair(iris, x)
})
)
Thanks!
The following works, but I'd like to select variables based on a naming convention / pattern, e.g. by wildcard or phrase at beginning or end of variable names.
I'd also like to specify more parameters in the vector, i.e. also pre-/postfixes
(I call ".Width" and ".Length" postfixes and "Petal" and "Sepal" variable names.)
Suggestions?
library(tidyverse)
library(glue)
# code that does one
transmute(
iris,
Sepal_diff := Sepal.Width - Sepal.Length)
# what variables to run through. Specify pattern and postfix also?
my_var_parameters_src_vec <- c("Petal", "Sepal")
# a function that achieves the same
do_variable <- function(data, src) {
transmute(
data,
"{src}_diff" := !!sym(glue("{src}.Width")) - !!sym(glue("{src}.Length"))
)
}
# test the function
do_variable(iris, src = my_var_parameters_src_vec[1])
# finally use it fully
bind_cols(
iris,
map_dfc(my_var_parameters_src_vec, \(x){
do_variable(iris, x)
})
)
Thanks!
Is it also possible to use wildcard or position in the todo_df, e.g.
src=c("*al")
Or perhaps achieve this in another way. What I am after is to run the function on all variables containing the phrase "al" (or in a range of positions) instead of specifying their full names.
I think I'd have to make too many assumptions to attempt to address this further.
i.e. theres no clarity on how searching "al" , to in iris case recovering the two petal variarions and hte two sepal variations, would be used to determine what part is the left and what is the right , etc etc.
I'm grappling with a data exploration challenge involving pairs of variables with a specific naming convention: variablename1, variablename1_t2, variablename2, variablename2_t2, and so on. To enhance my analysis, I want to calculate the differences between each variable pair and subsequently perform crosstabulations using the crosstab_by function with automatic effect sizes and test statistics.