bind_rows on a list of database connection tibbles

KoderKow · December 9, 2022, 12:31am

I was reading through other Qs and came across this post, Looping an addition of column with part of a file name - tidyverse - RStudio Community. I somehow have never come across purrr::reduce()!

Calling the following did the trick!

my_list %>% 
    purrr::map(...) %>%
    purrr::reduce(dplyr::union_all) %>% 
    dplyr::collect()

However, I tested the speed against collecting each dataset per iteration and it is slightly faster.

my_list %>% 
    purrr::map(~ {
    ... %>%
        dplyr::collect(my_data)
    })

Trying this on a list that has 28 DB tibbles resulted in 10.89 seconds for the reduce method on 9.7 seconds for the collecting per iteration. I think my assumption of binding rows on the DB side would be faster. Still not quite sure how its slower though. Apologies for the lack of reprex. I haven't had to make a DB-related reprex before

If there is a better method I would love to see it!