I'm not sure if I understand do()
correctly, but maybe it depends on the relative cost of allocation.
While do()
iterates calculation with indices of groups without allocation, nest()
actually splits the data.frame into many pieces, which needs allocation and thus takes time.
But, in other words, nest()
can allocate data.frames that is already split, while do()
can't. So, if you do the same calculation over the same data.frame many times, nest()
+ map()
can be faster.
g <- xx %>%
group_by(x, y)
microbenchmark(
usedo = {
do(g, zz = mean_and_sd(.$z))
do(g, zz = mean_and_sd(.$z))
},
usemap = {
n <- nest(g)
transmute(n, x = x, y = y, zz = map(data, ~ mean_and_sd(.$z)))
transmute(n, x = x, y = y, zz = map(data, ~ mean_and_sd(.$z)))
},
times = 20
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> usedo 909.9741 1040.7445 1164.2193 1190.0480 1290.1719 1361.6217 20
#> usemap 533.2164 651.7122 735.9906 716.0828 803.2196 948.5835 20