For your questions 2 and 3, there is a slightly easier solution: in R, you can give arguments with default values, that will be used if the user doesn't supply other arguments themselves:
my_function1 <- function(data, var=cyl, stat=mean){
data %>%
as_tibble() %>%
group_by({{var}}) %>%
summarise(stat = stat(mpg)) %>%
ungroup()
}
my_function1(
data = mtcars,
var = cyl,
stat = median
)
You can find explanations of default values here. Since they are a very common feature of R functions, I don't think you need to warn the user (but see below if you want).
I also made another change: in R, the last value is automatically returned, so it is more common not to use a return()
statement. There is nothing wrong with it if you prefer to always use output
as intermediary variable, it's just not useful.
Stop for empty data
As explained here, you can use if
to check a condition, and, if that condition is fulfilled, give an error message etc.
Now, the condition you want to check is whether the user supplied the function arguments. The function missing()
is meant to check that. That way, missing(data)
returns TRUE
if data
has not been provided. So this function should do what you want:
my_function2 <- function(data, var=cyl, stat=mean){
if(missing(data)){
stop("data required")
}
data %>%
as_tibble() %>%
group_by({{var}}) %>%
summarise(stat = stat(mpg)) %>%
ungroup()
}
my_function2(
data = mtcars,
var = cyl,
stat = median
)
Note the existence of the shorthand stopifnot()
which can be useful.
Warn when var and stat missing
Now we are going into something harder. For your questions 2 and 3, how to proceed if you really don't want to use default values?
Rather than stop()
use can use other condiditons, pretty well explained here. Briefly, you can use message("var not supplied")
or warning("var not supplied")
to inform the user of something, whereas stop()
is to say there is an error and the function can't continue. So your function might look like this:
my_function3 <- function(data, var, stat){
if(missing(data)){
stop("data required")
}
if(missing(var)){
message("var missing, cyl used instead")
}
if(missing(stat)){
message("stat missing, mean used instead")
}
data %>%
as_tibble() %>%
group_by({{var}}) %>%
summarise(stat = stat(mpg)) %>%
ungroup()
}
Now, the difficulty is to actually perform the replacement. It would be easy if you were only using classic R and providing the arguments as strings, here it is harder because you are using tidy evaluation. For the function stat
, it works as expected:
my_function4 <- function(data, var, stat){
if(missing(data)){
stop("data is missing")
}
if(missing(var)){
message("var missing, cyl used instead")
}
if(missing(stat)){
message("stat missing, mean used instead")
stat <- mean
}
output <- data %>%
as_tibble() %>%
group_by({{var}}) %>%
summarise(stat = stat(mpg)) %>%
ungroup()
return(output)
}
my_function4(
data = mtcars,
var = cyl
)
(make sure you give the name of the function mean
without parentheses, if you use mean()
you are actually calling that function with no argument)
But for var
it is harder, since you are using quasiquotation. The theory is very painful to understand, the short version is that this should work (I think):
my_function5 <- function(data, var, stat){
if(missing(data)){
stop("data required")
}
if(missing(var)){
message("var missing, cyl used instead")
var <- expr(cyl)
} else{
var <- enquo(var)
}
if(missing(stat)){
message("stat missing, mean used instead")
stat <- mean
}
data %>%
as_tibble() %>%
group_by(!!var) %>%
summarise(stat = stat(mpg)) %>%
ungroup()
}
my_function5(
data = mtcars,
stat = mean
)
Short version, calling {{var}}
is actually equivalent to calling !!enquo(var)
, meaning you capture the content of var
in a quosure, and you evaluate it. So to alter the content of var, we need to provide an expression rather than a value. This is pretty advanced R/tidyverse, if you don't have a lot of experience with R it's probably better not to try too hard. Note that, if you are used to other programming languages such as C, you might find the standard R approach (no tidy evaluation) more intuitive: just provide variable names as strings.
my_function6 <- function(data, var, stat){
if(missing(data)){
stop("data required")
}
if(missing(var)){
message("var missing, cyl used instead")
var <- "cyl"
}
if(missing(stat)){
message("stat missing, mean used instead")
stat <- "mean"
}
stat <- match.fun(stat)
data %>%
as_tibble() %>%
group_by(.data[[var]]) %>%
summarise(stat = stat(mpg)) %>%
ungroup()
}
my_function6(data = mtcars,
var = "cyl",
stat = "median")
where match.fun()
is used to find a function when given its name as a string, and the .data
pronoun is a way to mix tidyverse functions with more classic base R.