Dplyr: Alternatives to rowwise

jdlong · May 9, 2018, 10:21am



OK, I talked myself off the ledge of using `group_by( row_number() )` by testing it. Turns out @hadley 's list naming trick is ~ 10x faster. Here's my test:

```library(rbenchmark)
library(tidyverse)
set.seed(42)
n <- 1e4
df <- data.frame(my_int  = sample(1:5, n, replace=TRUE), 
           my_min = sample(1:5, n, replace=TRUE), 
           range  = sample(1:5, n, replace=TRUE))

benchmark(
df %>% 
  group_by(r=row_number()) %>%
  mutate(calc = list(runif(my_int, my_min, my_min + range) )) %>%
  ungroup() %>%
  select(-r) -> 
out
)
#>                                                                                                                                   test
#> 1 out <- df %>% group_by(r = row_number()) %>% mutate(calc = list(runif(my_int, my_min, my_min + range))) %>% ungroup() %>% select(-r)
#>   replications elapsed relative user.self sys.self user.child sys.child
#> 1          100   51.51        1     51.42     0.06         NA        NA

benchmark(
df %>%
  mutate(data = pmap(list(n = my_int, min = my_min, max = my_min + range), runif)) -> out
)
#>                                                                                             test
#> 1 out <- df %>% mutate(data = pmap(list(n = my_int, min = my_min, max = my_min + range), runif))
#>   replications elapsed relative user.self sys.self user.child sys.child
#> 1          100     5.5        1       5.5        0         NA        NA
```