Hi, I have some basic question regarding seeds when using parallelization. Suppose I would like to create a new column named SUM
based on the numeric columns from the iris
dataset:
library(tidyverse)
library(future)
library(furrr)
sum_function <- function(a, b, c, d){
return (a + b + c + d + rnorm(1))
}
plan("multisession", workers = 1)
set.seed(42, kind = "L'Ecuyer-CMRG")
new_iris <- iris %>%
mutate(SUM = pmap(.l = list(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width),
.f = sum_function))
future_iris <- iris %>%
mutate(SUM = future_pmap(.l = list(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width),
.f = sum_function,
.options = furrr_options(seed = 42)))
Unfortunately, the dataframes differ in their value in the SUM
column. How can I fix this?