I have a script which uses future::plan(multiprocess, workers = 16)
to allocate 16 cores on my machine I then have some furrr::future_
functions to run code in parallel.
I am attempting to move this code to DataBricks and take advantage of all of the cores available in the cluster. My current DataBricks test set up has 2 to 8 workers each with 4 cores. How do I change future::plan(multiprocess, workers = 16)
to recruit all of the cores in the cluster?
What I have tried
future::plan(multicore, workers = 32)
- becuase the DataBricks workers are all Linux based I thought that changing the method from multiprocess to multicore and increasing the number of cores to the max number available in the cluster would be a simple first step. It turns out that setting workers at 4, 8, 32 all result in the same speed of computation. So this does not seem like the correct solution.
I have been reading the documentation on parallel::makeCluster
found here, but I am not sure if I need to make a cluster because DataBricks already provides clusters I just need to access the cores somehow.