how to control the parameters in furrr?

von_olaf · April 17, 2020, 5:05pm

Consider this simple example

library(furrr)
library(tibble)

mytib <- tibble(group = c(1,1,1,1,2,2,2,2,2),
                value = rep('all your bases are belong to us', times = 9))

> mytib
# A tibble: 9 x 2
  group value                          
  <dbl> <chr>                          
1     1 all your bases are belong to us
2     1 all your bases are belong to us
3     1 all your bases are belong to us
4     1 all your bases are belong to us
5     2 all your bases are belong to us
6     2 all your bases are belong to us
7     2 all your bases are belong to us
8     2 all your bases are belong to us
9     2 all your bases are belong to us

I want to understand how furrr is working here. Assume I need to group the data first.

> grouped <- mytib %>% group_by(group) %>% nest()
> grouped
# A tibble: 2 x 2
# Groups:   group [2]
  group data            
  <dbl> <list>          
1     1 <tibble [4 × 1]>
2     2 <tibble [5 × 1]>

Now when I run the following code

> grouped %>% mutate(map = future_map_dbl(data, ~sum(str_detect(.$value, 'base'))))
# A tibble: 2 x 3
# Groups:   group [2]
  group data               map
  <dbl> <list>           <dbl>
1     1 <tibble [4 × 1]>     4
2     2 <tibble [5 × 1]>     5

Using multicore on linux, are the two dataframes considered as "chunks" by furrr and sent to the workers separately independently? Is this the correct way to affect how furrr splits the work among the CPUs ?

Thanks!

system · May 8, 2020, 5:05pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.