I have a set of functions that take single column name as inputs as well as multiple group_by
columns.
What I'm Trying to Achieve
I want to create an additional function that checks if all the columns supplied as inputs (whether single column inputs or group_by
column names) are present in the dataframe. However I'm also using the rlang
package for tidyeval and I'm a bit confused about quo
, enquo
, etc.
Reprex Code
# Sample Data
bom_mrr = c(100, 350, 50, 68, 68, 10)
eom_mrr = c(200, 150, 45, 90, 87, 34)
cohort = c("cohort1", "cohort2", "cohort2", "cohort3", "cohort1", "cohort2")
month = c(as.Date("2024-01-01"), as.Date("2024-01-01"), as.Date("2024-01-01"),
as.Date("2024-02-01"), as.Date("2024-02-01"), as.Date("2024-02-01"))
df = tibble(month, cohort, bom_mrr, eom_mrr)
# Function to check if all column inputs are present in the dataframe (from chat gpt)
check_present_columns <- function(data, ...) {
# Convert the ... arguments into a character vector of column names
column_names <- rlang::ensyms(...)
required_columns <- sapply(column_names, rlang::as_string)
# Check if all required columns are present in the dataframe
if (!all(required_columns %in% names(data))) {
missing_cols <- required_columns[!required_columns %in% names(data)]
stop("The following supplied columns are missing from the dataframe: ", paste(missing_cols, collapse = ", "), call. = FALSE)
}
invisible(TRUE) # Return invisibly if checks pass
}
# Main function to calculate mrr retention
get_mrr_retention_rate <- function(data, bom_mrr_column, eom_mrr_column, group_column) {
# rlang setup
group_cols_expr <- rlang::enquo(group_column)
bom_mrr_expr <- rlang::enquo(bom_mrr_column)
eom_mrr_expr <- rlang::enquo(eom_mrr_column)
# checks
check_present_columns(data, !!bom_mrr_expr, !!eom_mrr_expr)
# calculation
if (!missing(group_column)) {
data <- data %>%
dplyr::select(!!bom_mrr_expr, !!eom_mrr_expr, !!group_cols_expr) %>%
dplyr::group_by(dplyr::across(!!group_cols_expr))
} else {
data <- data %>% dplyr::select(!!bom_mrr_expr, !!eom_mrr_expr, !!group_cols_expr)
}
mrr_retention_tbl <- data %>%
dplyr::summarise(
total_bom_mrr = sum(!!bom_mrr_expr),
total_eom_mrr = sum(!!eom_mrr_expr)
) %>%
dplyr::ungroup() %>%
dplyr::mutate(mrr_retention_rate = total_eom_mrr / total_bom_mrr)
return(mrr_retention_tbl)
}
Outputs
When I run the function normally, everyting works fine -
df %>%
get_mrr_retention_rate(
bom_mrr_column = bom_mrr,
eom_mrr_column = eom_mrr,
group_column = c(month, cohort)
)
# A tibble: 5 × 5
month cohort total_bom_mrr total_eom_mrr mrr_retention_rate
<date> <chr> <dbl> <dbl> <dbl>
1 2024-01-01 cohort1 100 200 2
2 2024-01-01 cohort2 400 195 0.488
3 2024-02-01 cohort1 68 87 1.28
4 2024-02-01 cohort2 10 34 3.4
5 2024-02-01 cohort3 68 90 1.32
Now say the user enters a wrong column name for eom_mrr_column
, the check_present_columns()
function works as expected -
# Wrong eom_mrr_column name
df %>%
get_mrr_retention_rate(
bom_mrr_column = bom_mrr,
eom_mrr_column = mrr_eom,
group_column = c(month, cohort)
)
# Expected error message
Error: The following supplied columns are missing from the dataframe: mrr_eom
However say the user enters a wrong column name for one of the group_column
, the
check_present_columns()
function does not appear to work, as the error message is different -
# wrong spelling for cohort
df %>%
get_mrr_retention_rate(
bom_mrr_column = bom_mrr,
eom_mrr_column = eom_mrr,
group_column = c(month, cohhort)
)
# Error message
Error in `dplyr::select()`:
! Can't subset columns that don't exist.
✖ Column `cohhort` doesn't exist.
Run `rlang::last_trace()` to see where the error occurred.
The function still reports the error correctly, but not in the same format as the second example above.
I feel like this has something to do with my use of rlang
incorrectly somewhere, but I'm not sure where. Any help will be appreciated. Also, if there is a better/more efficient way to achive what I'm trying to do, feel free to suggest. Thanks.