How to use all cores on a Databricks cluster with furrr/future/parallel

alex628 · February 12, 2019, 1:13pm

I have a script which uses future::plan(multiprocess, workers = 16) to allocate 16 cores on my machine I then have some furrr::future_ functions to run code in parallel.

I am attempting to move this code to DataBricks and take advantage of all of the cores available in the cluster. My current DataBricks test set up has 2 to 8 workers each with 4 cores. How do I change future::plan(multiprocess, workers = 16) to recruit all of the cores in the cluster?

What I have tried

future::plan(multicore, workers = 32) - becuase the DataBricks workers are all Linux based I thought that changing the method from multiprocess to multicore and increasing the number of cores to the max number available in the cluster would be a simple first step. It turns out that setting workers at 4, 8, 32 all result in the same speed of computation. So this does not seem like the correct solution.

I have been reading the documentation on parallel::makeCluster found here, but I am not sure if I need to make a cluster because DataBricks already provides clusters I just need to access the cores somehow.

system · March 5, 2019, 1:14pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.