ifelse
/dplyr::if_else
is more or less a vectorized if
. Caveats: ifelse
drops attributes, so
ifelse(TRUE, Sys.Date(), Sys.Date())
#> [1] 17482
fails. if_else
maintains some attributes, and so handles dates and factors better:
dplyr::if_else(TRUE, Sys.Date(), Sys.Date())
#> [1] "2017-11-12"
and is more type-safe, but still drops some attributes like dim
:
dplyr::if_else(as.logical(diag(2)), diag(2), diag(2))
#> [1] 1 0 0 1
and gets very unhappy if you try to return a more complicated object like a model (unless wrapped in a list, anyway). Since if
is not vectorized, it can return any object, which is helpful for working with objects more complicated than vectors:
if (TRUE) lm(mpg ~ wt, mtcars)
#>
#> Call:
#> lm(formula = mpg ~ wt, data = mtcars)
#>
#> Coefficients:
#> (Intercept) wt
#> 37.285 -5.344
or just running arbitrary code depending on a condition:
flips <- 0
if (rnorm(1) > 0) {
Sys.sleep(1)
flips <- flips + 1
'heads'
} else {
Sys.sleep(1)
flips <- flips + 1
'tails'
}
#> [1] "tails"
flips
#> [1] 1
...but since if
is not vectorized, an equivalent call to ifelse
would require iterating, which is frequently not the best approach.
In practice, ifelse
/if_else
tends to be used a lot in dplyr code due to the inability to assign to a subset, so people write
library(dplyr)
mtcars %>% head() %>% mutate(mpg = if_else(mpg > 20, 20, mpg))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 20.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> 2 20.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> 3 20.0 4 108 93 3.85 2.320 18.61 1 1 4 1
#> 4 20.0 6 258 110 3.08 3.215 19.44 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
instead of
mtcars <- head(mtcars)
mtcars[mtcars$mpg > 20, 'mpg'] <- 20
mtcars
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 20.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 20.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 20.0 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 20.0 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
A lot of words have been written on the subject, but the tidyverse idiom has settled into if_else
.
case_when
can reproduce the behavior of if_else
, but requires a condition for each return value. It's a lot more useful for its fallback evaluation, wherein the first condition that returns TRUE
determines the return value selected. Before it existed, such cases were not infrequently handled by heinous nested ifelse
s:
mtcars %>%
mutate(mpg_level = ifelse(mpg < 15,
'low',
ifelse(mpg < 20,
'medium-low',
ifelse(mpg < 25,
'medium-high',
'high')))) %>%
sample_n(6)
#> mpg cyl disp hp drat wt qsec vs am gear carb mpg_level
#> 19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 high
#> 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 medium-high
#> 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 medium-low
#> 31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 medium-low
#> 21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 medium-high
#> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 medium-high
which can now be written as the more svelte
mtcars %>%
mutate(mpg_level = case_when(mpg < 15 ~ 'low',
mpg < 20 ~ 'medium-low',
mpg < 25 ~ 'medium-high',
TRUE ~ 'high')) %>%
sample_n(6)
#> mpg cyl disp hp drat wt qsec vs am gear carb mpg_level
#> 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 low
#> 16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 low
#> 30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 medium-low
#> 28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 high
#> 23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 medium-low
#> 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 medium-low
It's likely to be quite a bit less efficient than a findInterval
approach, but it's more flexible and arguably easier to write.