Hi !!
I have a question regarding parallelization with doSNOW and foreach,
I have seen many tutorials on parallelization (using other packages as well) which passed the objects from the main environment to each one of the 'R' cores/sessions using the clusterExport() function.
I am wondering whether this should be done with doSNOW and foreach as well ..
I think that this is not necessary but I'd like to double check with someone who is more confident with parallel processing than me.
For instance in the example below, I split the World by continent and I intersect each continent with a spatial points dataframe. I checked the system time and compared it with a simple for loop and it seems that the code is much faster without the need of exporting both the list of continents (Listsplit) and the spatial points (points) .
Could you please provide me with a feedback?
I take advantage of this post, to ask you how to check if the parallelization is properly implemented both in Windows and Linux system as the Task manager and top command are not really informative. given that they provide a general information on the CPU and memory usage .. Ideally I would like to see the actual cores that are used.
Thank you
library("doSNOW") # Parallelization
#> Warning: il pacchetto 'doSNOW' è stato creato con R versione 4.3.2
#> Caricamento del pacchetto richiesto: foreach
#> Warning: il pacchetto 'foreach' è stato creato con R versione 4.3.2
#> Caricamento del pacchetto richiesto: iterators
#> Warning: il pacchetto 'iterators' è stato creato con R versione 4.3.2
#> Caricamento del pacchetto richiesto: snow
#> Warning: il pacchetto 'snow' è stato creato con R versione 4.3.2
library("foreach") # Parallelization
library("tmap") # World shapefile
#> Warning: il pacchetto 'tmap' è stato creato con R versione 4.3.2
#> The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
#> which was just loaded, will retire in October 2023.
#> Please refer to R-spatial evolution reports for details, especially
#> https://r-spatial.org/r/2023/05/15/evolution4.html.
#> It may be desirable to make the sf package available;
#> package maintainers should consider adding sf to Suggests:.
#> The sp package is now running under evolution status 2
#> (status 2 uses the sf package in place of rgdal)
#> Breaking News: tmap 3.x is retiring. Please test v4, e.g. with
#> remotes::install_github('r-tmap/tmap')
library("ggplot2") # Plot
library("sf") # Work with spatial data
#> Warning: il pacchetto 'sf' è stato creato con R versione 4.3.2
#> Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE
library("dplyr") # Data manipulation
#> Warning: il pacchetto 'dplyr' è stato creato con R versione 4.3.2
#>
#> Caricamento pacchetto: 'dplyr'
#> I seguenti oggetti sono mascherati da 'package:stats':
#>
#> filter, lag
#> I seguenti oggetti sono mascherati da 'package:base':
#>
#> intersect, setdiff, setequal, union
# Data
data(World) # World Shapefile
points<-st_sample(World, size=1000)
# Plot
ggplot() +
geom_sf(data=World)+
geom_sf(data=points, colour = "red", size = 0.5)+
coord_sf(xlim=c(st_bbox(World)[1],st_bbox(World)[3]),
ylim=c(st_bbox(World)[2],st_bbox(World)[4]))
Listsplit<-World |> group_split(continent)
cl <- makeSOCKcluster(8)
registerDoSNOW(cl)
pb <- txtProgressBar(max = 8, style = 3)
#> | | | 0%
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress = progress)
# Parallel function
system.time({
parallelfunction<-foreach(i=1:length(Listsplit), .packages = c("sf"),
.options.snow = opts) %dopar% {
st_intersection(points,Listsplit[[i]])
}
}
)
#> | |========= | 12% | |================== | 25% | |========================== | 38% | |=================================== | 50% | |============================================ | 62% | |==================================================== | 75% | |============================================================= | 88% | |======================================================================| 100%
#> utente sistema trascorso
#> 0.05 0.03 4.64
stopCluster(cl)
# Loop
results<-list()
system.time({
for(i in 1:length(Listsplit)){
results[[i]]<-st_intersection(points,Listsplit[[i]])
}
})
#> utente sistema trascorso
#> 6.19 0.14 7.22
Created on 2023-12-17 with reprex v2.0.2