I seem to frequently find myself wanting to unnest
list-columns that contain vectors, because they should really be their own columns. Often if we use lapply
or map
to iterate we end up with a function that returns a vector, such as with quantile
below. We could imagine wanting to iterate over many different vectors of distributions with different parameters and getting quantiles. However, in order to use unnest
to get multiple columns out, we need a one-row data frame. The most "obvious" way of doing it with tidyverse functions that I could see was enframe
and then spread
, since enframe
is supposed to be the standard function for creating a tibble from a vector. However, spread
is not fast and calling it for every row can quickly become undesirable.
Here I benchmarked a few different alternatives that I could think of, mostly running through matrix
. I'm not the best at profiling and am not too sure why the saving of one names<-
call gets such a boost, but all of these options are much, much faster than the seemingly "neat" method using enframe
.
The question is: Am I missing some other method that would be faster?
The discussion part is: Should this operation be made easier, or approached in some other manner?
set.seed(1)
named_vec <- quantile(rnorm(1000), c(0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95))
named_vec
#> 5% 10% 25% 50% 75% 90%
#> -1.72695999 -1.33933368 -0.69737322 -0.03532423 0.68842795 1.32402975
#> 95%
#> 1.74398317
library(tidyverse)
bench::mark(
enframe(named_vec) %>% spread(name, value),
as_tibble(matrix(named_vec, nrow = 1, dimnames = list(NULL, names(named_vec)))),
data.frame(matrix(named_vec, nrow = 1)) %>% `names<-`(names(named_vec)),
as.data.frame(matrix(named_vec, nrow = 1)) %>% `names<-`(names(named_vec)),
as.data.frame(matrix(named_vec, nrow = 1, dimnames = list(NULL, names(named_vec))))
)
#> # A tibble: 5 x 10
#> expression min mean median max `itr/sec` mem_alloc n_gc
#> <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 enframe(n… 1.36ms 1.56ms 1.52ms 2.15ms 640. 634KB 10
#> 2 as_tibble… 262.67µs 300.5µs 287.38µs 663.3µs 3328. 0B 13
#> 3 data.fram… 138.9µs 166.06µs 162.3µs 402.71µs 6022. 280B 11
#> 4 as.data.f… 73.77µs 86.93µs 84.42µs 313.09µs 11503. 280B 14
#> 5 as.data.f… 16.22µs 19.55µs 18.92µs 120.47µs 51151. 0B 8
#> # … with 2 more variables: n_itr <int>, total_time <bch:tm>
Created on 2019-04-25 by the reprex package (v0.2.1)