Consider this simple example
library(furrr)
library(tibble)
mytib <- tibble(group = c(1,1,1,1,2,2,2,2,2),
value = rep('all your bases are belong to us', times = 9))
> mytib
# A tibble: 9 x 2
group value
<dbl> <chr>
1 1 all your bases are belong to us
2 1 all your bases are belong to us
3 1 all your bases are belong to us
4 1 all your bases are belong to us
5 2 all your bases are belong to us
6 2 all your bases are belong to us
7 2 all your bases are belong to us
8 2 all your bases are belong to us
9 2 all your bases are belong to us
I want to understand how furrr
is working here. Assume I need to group the data first.
> grouped <- mytib %>% group_by(group) %>% nest()
> grouped
# A tibble: 2 x 2
# Groups: group [2]
group data
<dbl> <list>
1 1 <tibble [4 × 1]>
2 2 <tibble [5 × 1]>
Now when I run the following code
> grouped %>% mutate(map = future_map_dbl(data, ~sum(str_detect(.$value, 'base'))))
# A tibble: 2 x 3
# Groups: group [2]
group data map
<dbl> <list> <dbl>
1 1 <tibble [4 × 1]> 4
2 2 <tibble [5 × 1]> 5
Using multicore on linux, are the two dataframes considered as "chunks" by furrr
and sent to the workers separately independently? Is this the correct way to affect how furrr
splits the work among the CPUs ?
Thanks!