Hi all,
I'm writing a package and when trying to evaluate potential user's input I run into an issue I can't seem to solve with dplyr.
The user provides a data frame, of which I only know for sure that the first n columns are present (in example column x and y). There can be any other number of columns too, which I don't know beforehand. What I need to do in the code is make a summary of data based on the columns I know, but I would like to user to be able to specify summary functions for the columns that I don't know, so I don't have to discard them during the process.
REPREX - simplified version of actual problem with more columns
Overview
We only know for sure that column x and y will be present
x y a b
1 1 0.2002145 1 a
2 2 0.6852186 2 b
3 1 0.9168758 3 c
4 2 0.2843995 4 d
5 1 0.1046501 5 e
6 2 0.7010575 6 f
The user then provides a list of summary functions for unknown columns
funList = c(a = sum, b = function(x) {paste0(x, collapse = "")})
And out should come this:
# A tibble: 2 x 4
# Groups: x [2]
x y a b
<int> <dbl> <int> <chr>
1 1 0.917 9 ace
2 2 0.701 12 bdf
I figured out a way to to this using the following code
library(tidyverse)
set.seed(5)
#Data provided by user (only x and y are known for sure)
myData = data.frame(x = 1:2, y = runif(6), a = 1:6, b = letters[1:6])
#Summarise functions per column
funList = list(a = sum, b = function(x) {paste0(x, collapse = "")})
#Summarise the data using the user's functions
#----------------------------------------------
#First update all columns
for(name in names(funList)){
if(name %in% colnames(myData)){
myData = myData %>% group_by(x) %>%
mutate({{name}} := funList[[{{name}}]](!!sym(name)))
} else{
warning("No column with name '", name, "' exists")
}
}
#Get the summary value for y, then create the final table
myData %>% mutate(y = max(y)) %>% distinct()
But this is cheating as I would love to use the summary function from dplyr instead, but I can only provide it with a list of functions that will be applied to all columns which will fail as not all have the same type of summary
myData %>% group_by(x) %>% summarise(
y = max(y),
across(.cols = contains(names(funList)), .fns = funList)
)
Error: Problem with `summarise()` input `..2`.
x invalid 'type' (character) of argument
i Input `..2` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
i The error occurred in group 1: x = 1.
Run `rlang::last_error()` to see where the error occurred.
Any ideas??
PJ