I am looking for a function for doing arbitrary pairwise operations that can be applied in an analagous way to dplyr::across()
(i.e. could be used in mutate()
or summarise()
and handle groups, etc).
Fictive examples of such a pairwise()
function:
library(dplyr)
cor_p_value <- function(x, y){
stats::cor.test(x, y)$p.value
}
ks_p_value <- function(x, y){
stats::ks.test(x, y)$p.value
}
iris <- as_tibble(iris)
# hypothetical use within mutate()
iris %>%
mutate(pairwise(.col = where(is.numeric),
.fns = ~ .x / .y, # could also just do `/`
.names = "ratio_{.col$.x}_{.col$.y}",
associative = FALSE)
) %>%
glimpse()
#> Rows: 150
#> Columns: 17
#> $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, ...
#> $ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, ...
#> $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, ...
#> $ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, ...
#> $ Species <fct> setosa, setosa, setosa, setosa, set...
#> $ ratio_Sepal.Length_Sepal.Width <dbl> 1.457143, 1.633333, 1.468750, 1.483...
#> $ ratio_Sepal.Length_Petal.Length <dbl> 3.642857, 3.500000, 3.615385, 3.066...
#> $ ratio_Sepal.Length_Petal.Width <dbl> 25.50000, 24.50000, 23.50000, 23.00...
#> $ ratio_Sepal.Width_Petal.Length <dbl> 2.500000, 2.142857, 2.461538, 2.066...
#> $ ratio_Sepal.Width_Petal.Width <dbl> 17.50000, 15.00000, 16.00000, 15.50...
#> $ ratio_Petal.Length_Petal.Width <dbl> 7.000000, 7.000000, 6.500000, 7.500...
#> $ ratio_Sepal.Width_Sepal.Length <dbl> 0.6862745, 0.6122449, 0.6808511, 0....
#> $ ratio_Petal.Length_Sepal.Length <dbl> 0.2745098, 0.2857143, 0.2765957, 0....
#> $ ratio_Petal.Width_Sepal.Length <dbl> 0.03921569, 0.04081633, 0.04255319,...
#> $ ratio_Petal.Length_Sepal.Width <dbl> 0.4000000, 0.4666667, 0.4062500, 0....
#> $ ratio_Petal.Width_Sepal.Width <dbl> 0.05714286, 0.06666667, 0.06250000,...
#> $ ratio_Petal.Width_Petal.Length <dbl> 0.14285714, 0.14285714, 0.15384615,...
# hypothetical use within summarise()
iris %>%
group_by(Species)
summarise(pairwise(.col = where(is.numeric),
.fns = list(ksp = ks_p_value, corp = cor_p_value),
.names = "{.fn}_{.col$.x}_{.col$.y}",
associative = TRUE)
) %>%
glimpse()
#> Rows: 3
#> Columns: 13
#> $ Species <fct> setosa, versicolor, virginica
#> $ ksp_Sepal.Length_Sepal.Width <dbl> 0, 0, 0
#> $ ksp_Sepal.Length_Petal.Length <dbl> 0.000000e+00, 0.000000e+00, 6.951782...
#> $ ksp_Sepal.Length_Petal.Width <dbl> 0, 0, 0
#> $ ksp_Sepal.Width_Petal.Length <dbl> 0, 0, 0
#> $ ksp_Sepal.Width_Petal.Width <dbl> 0, 0, 0
#> $ ksp_Petal.Length_Petal.Width <dbl> 0, 0, 0
#> $ corp_Sepal.Length_Sepal.Width <dbl> 6.709843e-10, 8.771860e-05, 8.434625...
#> $ corp_Sepal.Length_Petal.Length <dbl> 6.069778e-02, 2.586190e-10, 6.297786...
#> $ corp_Sepal.Length_Petal.Width <dbl> 5.052644e-02, 4.035422e-05, 4.798149...
#> $ corp_Sepal.Width_Petal.Length <dbl> 2.169789e-01, 2.302168e-05, 3.897704...
#> $ corp_Sepal.Width_Petal.Width <dbl> 1.038211e-01, 1.466661e-07, 5.647610...
#> $ corp_Petal.Length_Petal.Width <dbl> 1.863892e-02, 1.271916e-11, 2.253577...
# don't know if this a realistic way to handle .names
# also may make sense to have option to pivot columns so ends-up in format more like `corrr`
# may be interesting to make a pwise() function (like pmap()) that handles combinations of 2, 3, ... p columns
# maybe indirectly run `corrr::colpair_map() ...
What would such a pairwise()
function (or similar) look like? (Ideally set-up in a way that is consistent with tidyverse + tidyselection etc.)
Or what are the resources I need to read in order to go about setting it up? (Eg How would I grab the .col
variables selected and then set-up the respective .x and .y combinations from these? etc...)
Note on current approaches:
There are excellent tidyverse friendly packages that can be used to create the output I describe above (e.g. corrr
, widyr
, parts of recipes
). I've written/tweeted about these and related approaches, documented here. The specific outputs pasted into the code snippets for this question were returned via the code at this gist and uses those methods.
With this topic, I am interested in an approach for pairwise operations that (in some circumstances) may be slightly easier to chain with piped dplyr
verbs (though am not attached to the exact particulars of the pairwise()
function I describe above).