Understand parallelization with doSNOW and foreach , do I have to export the objects from the main environment to each one of the 'R' cores/sessions ?

Hi !!
I have a question regarding parallelization with doSNOW and foreach,
I have seen many tutorials on parallelization (using other packages as well) which passed the objects from the main environment to each one of the 'R' cores/sessions using the clusterExport() function.
I am wondering whether this should be done with doSNOW and foreach as well ..
I think that this is not necessary but I'd like to double check with someone who is more confident with parallel processing than me.

For instance in the example below, I split the World by continent and I intersect each continent with a spatial points dataframe. I checked the system time and compared it with a simple for loop and it seems that the code is much faster without the need of exporting both the list of continents (Listsplit) and the spatial points (points) .

Could you please provide me with a feedback?

I take advantage of this post, to ask you how to check if the parallelization is properly implemented both in Windows and Linux system as the Task manager and top command are not really informative. given that they provide a general information on the CPU and memory usage .. Ideally I would like to see the actual cores that are used.

Thank you

library("doSNOW") # Parallelization 
#> Warning: il pacchetto 'doSNOW' è stato creato con R versione 4.3.2
#> Caricamento del pacchetto richiesto: foreach
#> Warning: il pacchetto 'foreach' è stato creato con R versione 4.3.2
#> Caricamento del pacchetto richiesto: iterators
#> Warning: il pacchetto 'iterators' è stato creato con R versione 4.3.2
#> Caricamento del pacchetto richiesto: snow
#> Warning: il pacchetto 'snow' è stato creato con R versione 4.3.2
library("foreach") # Parallelization 
library("tmap") # World shapefile
#> Warning: il pacchetto 'tmap' è stato creato con R versione 4.3.2
#> The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
#> which was just loaded, will retire in October 2023.
#> Please refer to R-spatial evolution reports for details, especially
#> https://r-spatial.org/r/2023/05/15/evolution4.html.
#> It may be desirable to make the sf package available;
#> package maintainers should consider adding sf to Suggests:.
#> The sp package is now running under evolution status 2
#>      (status 2 uses the sf package in place of rgdal)
#> Breaking News: tmap 3.x is retiring. Please test v4, e.g. with
#> remotes::install_github('r-tmap/tmap')
library("ggplot2") # Plot
library("sf") # Work with spatial data
#> Warning: il pacchetto 'sf' è stato creato con R versione 4.3.2
#> Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE
library("dplyr") # Data manipulation
#> Warning: il pacchetto 'dplyr' è stato creato con R versione 4.3.2
#> 
#> Caricamento pacchetto: 'dplyr'
#> I seguenti oggetti sono mascherati da 'package:stats':
#> 
#>     filter, lag
#> I seguenti oggetti sono mascherati da 'package:base':
#> 
#>     intersect, setdiff, setequal, union


# Data 
data(World) # World Shapefile 
points<-st_sample(World, size=1000)


# Plot 
ggplot() +
  geom_sf(data=World)+
  geom_sf(data=points, colour = "red", size = 0.5)+
  coord_sf(xlim=c(st_bbox(World)[1],st_bbox(World)[3]),
           ylim=c(st_bbox(World)[2],st_bbox(World)[4]))




Listsplit<-World |> group_split(continent)

cl <- makeSOCKcluster(8)
registerDoSNOW(cl)

pb <- txtProgressBar(max = 8, style = 3)
#>   |                                                                              |                                                                      |   0%
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress = progress)


# Parallel function 
system.time({
  
  parallelfunction<-foreach(i=1:length(Listsplit), .packages = c("sf"),
                            .options.snow = opts) %dopar% {
    
    st_intersection(points,Listsplit[[i]])
  }
  
}
)
#>   |                                                                              |=========                                                             |  12%  |                                                                              |==================                                                    |  25%  |                                                                              |==========================                                            |  38%  |                                                                              |===================================                                   |  50%  |                                                                              |============================================                          |  62%  |                                                                              |====================================================                  |  75%  |                                                                              |=============================================================         |  88%  |                                                                              |======================================================================| 100%
#>    utente   sistema trascorso 
#>      0.05      0.03      4.64

stopCluster(cl)


# Loop 

results<-list()

system.time({
  
  for(i in 1:length(Listsplit)){
    
    results[[i]]<-st_intersection(points,Listsplit[[i]])
  }
})
#>    utente   sistema trascorso 
#>      6.19      0.14      7.22

Created on 2023-12-17 with reprex v2.0.2

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.