I'm curious what would be a "typical tidyverse approach" for writing custom functions when we have different types of data structures to apply those function upon.
For example, I've built my own function for converting values of TRUE
and FALSE
to 1
and 0
.
library(magrittr)
#> Warning: package 'magrittr' was built under R version 4.0.3
convert_true_false_to_1_0 <- function(x) {
gsub("^(?:TRUE)$", 1, x, ignore.case = TRUE) %>%
gsub("^(?:FALSE)$", 0, ., ignore.case = TRUE)
}
set.seed(123)
my_vec <- sample(c(TRUE, FALSE, "true", "false"), 15, replace = TRUE)
my_vec
#> [1] "true" "true" "true" "FALSE" "true" "FALSE" "FALSE" "FALSE" "true"
#> [10] "TRUE" "false" "FALSE" "FALSE" "TRUE" "FALSE"
convert_true_false_to_1_0(my_vec)
#> [1] "1" "1" "1" "0" "1" "0" "0" "0" "1" "1" "0" "0" "0" "1" "0"
Created on 2021-02-05 by the reprex package (v0.3.0)
Right now , convert_true_false_to_1_0
is designed to operate over vectors. If I had wanted to make it work over columns in a data frame, I could've done either of the following options:
- Use
mutate(across((..., convert_true_false_to_1_0))
plainly; or - Write an additional variant for
convert_true_false_to_1_0()
that will be:
library(dplyr)
convert_true_false_to_1_0 <- function(x) {
gsub("^(?:TRUE)$", 1, x, ignore.case = TRUE) %>%
gsub("^(?:FALSE)$", 0, ., ignore.case = TRUE)
}
convert_true_false_to_1_0_over_df <- function(my_data, my_cols) {
my_data %>%
mutate(across({{ my_cols }}, convert_true_false_to_1_0))
}
set.seed(123)
matrix(sample(c(TRUE, FALSE, "true", "false"), 20, replace = TRUE), ncol = 5) %>%
as.data.frame() %>%
convert_true_false_to_1_0_over_df(my_data = ., my_cols = V1:V3)
#> V1 V2 V3 V4 V5
#> 1 1 1 1 FALSE false
#> 2 1 0 1 TRUE TRUE
#> 3 1 0 0 FALSE true
#> 4 0 0 0 true true
Created on 2021-02-05 by the reprex package (v0.3.0)
Is there a third way? Just as an example I have in my mind, something in the spirit of "adverbs": an over_df()
wrapper that will do over_df(convert_true_false_to_1_0, cols = ...)
. Is there a typical "tidyvers-ish" way to deal with such things?
EDIT
I think I should clarify that my motivation is to write cleaner and more readable code. This is why I prefer a wrapper/adverb than to use mutate(across(..., my_func))
.
EDIT 2 (2021-02-17)
I have found some code that echoes my intention in writing "wrappers instead of variants" functions: This code creates a function that wraps any dplyr
's join
function to ignore upper/lower cases when marging dataframes: https://gist.github.com/jimhester/a060323a05b40c6ada34
Are there any guidelines or training for doing similar things (i.e., writing wrappers)?