I think you'll get a 2x speed up or so from using factors
. At least in the case of using a recode
, it should only have to change the 3 labels on the levels
of the factors, rather than altering every element of the vector.
suppressWarnings(library(tidyverse))
suppressWarnings(library(microbenchmark))
# object to decode and the key
tbl <- tibble(CODE = sample(letters[1:3], 1e+06, replace = TRUE))
key <- tribble(~CODE, ~FRUIT,
"a", "apple",
"b", "banana",
"c", "cherry"
)
tbl_with_factor <- mutate(tbl, CODE = as.factor(CODE))
key_with_factor <- mutate(key, CODE = as.factor(CODE))
# Speed tests
microbenchmark::microbenchmark(
# Fastest method from previous discussion
left_join(tbl, key, by = "CODE"),
# Try a recode with factors
mutate(tbl_with_factor, FRUIT = recode(CODE, a = "apple", b = "banana", c = "cherry")),
# Try the same thing but with recode from forcats
mutate(tbl_with_factor, FRUIT = forcats::fct_recode(CODE, apple = "a", banana = "b", cherry = "c")),
# Use base R and change the levels
mutate(tbl_with_factor, FRUIT = `levels<-`(CODE, list("apple" = "a", "banana" = "b", "cherry" = "c"))),
# Left join when everything is a factor
left_join(tbl_with_factor, key_with_factor, by = "CODE")
)
This gives some nice results
Unit: milliseconds
expr min lq
left_join(tbl, key, by = "CODE") 50.89433 55.58770
mutate(tbl_with_factor, FRUIT = recode(CODE, a = "apple", b = "banana", c = "cherry")) 22.98268 29.49390
mutate(tbl_with_factor, FRUIT = forcats::fct_recode(CODE, apple = "a", banana = "b", cherry = "c")) 24.25023 28.93651
mutate(tbl_with_factor, FRUIT = `levels<-`(CODE, list(apple = "a", banana = "b", cherry = "c"))) 22.41556 26.74911
left_join(tbl_with_factor, key_with_factor, by = "CODE") 36.87323 41.05750
mean median uq max neval
64.41534 58.53341 61.98173 177.3578 100
38.77766 31.69801 35.41894 154.4150 100
36.24761 31.47930 34.62983 155.7492 100
35.36276 29.22803 32.72580 156.9444 100
46.23981 43.02139 47.31327 162.5166 100
Looking at median values, the mutate
+ recode() with factor
runs all seem to score the best, at about 2x what you were doing before.