Saying if/else is somewhat ambiguous, as there are three potential options: if_else
, if.else
, and if(...) {} else {}
. I'll work in that order, from most similar to least similar.
if_else()
: Like case_when
, this is vectorized -- the conditional is analyzed for each value in the condition, with the output becoming a hybrid of the given output vectors. If your case_when
statement takes on two potential values (and the second condition is of the form TRUE ~ ...
, then it is interchangable with if_else
. In this case, go with if_else
unless you believe that case_when
is more readable, because (at least in basic testing) if_else
is faster (with a preview of part of why to avoid ifelse()
:
suppressPackageStartupMessages(library(tidyverse))
microbenchmark::microbenchmark(
case_when(1:1000 < 100 ~ "low", TRUE ~ "high"),
if_else(1:1000 < 3, "low", "high"),
ifelse(1:1000 < 3, "low", "high")
)
#> Unit: microseconds
#> expr min lq mean
#> case_when(1:1000 < 100 ~ "low", TRUE ~ "high") 384.786 418.629 953.4921
#> if_else(1:1000 < 3, "low", "high") 61.943 67.686 128.9811
#> ifelse(1:1000 < 3, "low", "high") 256.797 264.796 391.7180
#> median uq max neval
#> 631.9420 708.4480 33149.364 100
#> 90.0435 127.9885 2496.182 100
#> 327.9695 460.8810 2354.246 100
ifelse()
: Not only is this slower than if_else
(see above), but it also runs into issues when the TRUE
and FALSE
vectors can have their types misinterpreted, and doesn't preserve types correctly in some cases. The if_else
documentation points this out:
suppressPackageStartupMessages(library(tidyverse))
# Unlike ifelse, if_else preserves types
x <- factor(sample(letters[1:5], 10, replace = TRUE))
ifelse(x %in% c("a", "b", "c"), x, factor(NA))
#> [1] NA NA 2 NA 2 NA 1 NA 1 NA
if_else(x %in% c("a", "b", "c"), x, factor(NA))
#> [1] <NA> <NA> c <NA> c <NA> b <NA> b <NA>
#> Levels: b c d e
if(cond) cons.expr else alt.expr
: This is actually a completely different intent than ifelse
and if_else
, in that cond
is treated as a scalar. In fact, if the length is greater than 1, only the first element will be used. As such, only one of the output expressions is evaluated, as you can see if you run the code:
if (FALSE) {Sys.sleep(10); print("Slow")} else print("Fast")
#> [1] "Fast"
(As an aside, if
is just a function with some built-in alternative syntax, so x <- if (FALSE) {Sys.sleep(10); "Slow"} else "Fast"
is valid code.)
The single-path evaluation is not so with case_when
, as both expressions will be evaluated regardless:
case_when(FALSE ~ {Sys.sleep(10); print("Slow")}, TRUE ~ print("Fast"))
#> [1] "Slow"
#> [1] "Fast"
#> [1] "Fast"
In summary, if you are testing a scalar, use if()
. Testing a vector against a single condition, dplyr::if_else
. Testing a vector against multiple conditions, use case_when
.