Hi,
I am trying to parallelize a spatial kernel on a virtual machine (windows) with 160 cores, out of which I am trying to use 6.
The running is extremely slow and I am not sure whether the parallelization works .
Unfortunately I was not able to add a progress bar or to check that the number of cores used is actually 6.
If I run on the terminal the top command I can see that the CPU is 100% but I want to have a more reliable insight.
The code is fine because I tried it without the kernel function (hotspot_kde) and it returns an expected list of sf objects but I am not sure how to verify that is really working on parallel.
The long running is due to the kernel .. I really need to speed this code up,
could you help me understand what I am doing wrong? How can I properly speed it up?
As it was taking too much I stopped the function KernelList and then call the stopCluster and it returned the message "double free or corruption" which makes me think that something is wrong in the parallel processing.
I also re-tried running the code dividing the dataset into small pieces and compute the calculation for each small piece in order to improve the memory allocation .. it was fine when I run it over the weekend .. it kept working till yesterday morning but I faced the "double free or corruption (out) aborted" error again.
Thank you for your help!!
library("ggplot2") # Plot
library("sf") # Work with spatial data
library("sfhotspot") # Kernel
library("doParallel") # Parallelization
library("foreach") # Parallelization
# Kernel density ----
# Optimal bandwidth
opt_bw <-2850
# Split by continent
df_split<-df_outbreaks |>
dplyr::group_split(continent) |>
setNames(sort(unique(df_outbreaks$continent)))
# Parallel processing ----
# Detect number of cores
detectCores()
# Make the clusters and register them
cluster = makeCluster(6)
registerDoParallel(cluster)
# Set up functions to be used in the parallelisation
KernelList<-foreach(i=1:length(df_split),
.packages=c("sf","sfhotspot")) %dopar% {
# Transform each dataframe into sf
sf<-st_as_sf(df_split[[i]], coords = c("x","y"),
crs="+proj=moll +lon_0=0 +x_0=0 +y_0=0")
# Function to create a Kernel density map for each sf object
hotspot_kde(sf,cell_size = 1000, bandwidth = opt_bw)
}
# Stop Cluster
stopCluster(cluster)