I am very new to the concept of parallel computing.
Here is my current understanding:
- Suppose I have a function. I want to run this function 1000 times.
- Let's say that each time I run this function, it is independent of other times I run this function
- I imagine it this way: if I run this function 1000 times, its like a bakery with 1000 customers and 10 employees. However, all 10 employee work on the same customer, and then collectively move to the second customer.
- But if I run the code in parallel, then 10 employees will take on the first 10 customers and work independently, potentially saving time.
I have the following R code that performs some random simulations (I can explain if required):
library(tidyverse)
# function
simulate_markov_chain <- function(simulation_num) {
# Transition matrices
transition_matrix_A <- matrix(c(1/3, 1/3, 1/3, # probabilities from state 1
1/3, 1/3, 1/3, # probabilities from state 2
0, 0, 1), # probabilities from state 3
nrow = 3, byrow = TRUE)
transition_matrix_B <- matrix(c(1/4, 1/4, 1/4, 1/4, # probabilities from state 1
1/4, 1/4, 1/4, 1/4, # probabilities from state 2
0, 0, 1, 0, # probabilities from state 3
1/4, 1/4, 1/4, 1/4), # probabilities from state 4
nrow = 4, byrow = TRUE)
state <- 1
chain <- "A"
path_df <- data.frame(iteration = 1, chain = chain, state = state)
iteration <- 1
while (state != 3) {
# Flip a coin
coin_flip <- sample(c("heads", "tails"), size = 1, prob = c(0.5, 0.5))
if (coin_flip == "heads" || chain == "B") {
chain <- "B"
state <- sample(1:4, size = 1, prob = transition_matrix_B[state, ])
} else {
state <- sample(1:3, size = 1, prob = transition_matrix_A[state, ])
}
iteration <- iteration + 1
path_df <- rbind(path_df, data.frame(iteration = iteration, chain = chain, state = state))
}
path_df$simulation_num <- simulation_num
return(path_df)
}
I then ran this function 1000 times - everything works perfectly::
results <- map_dfr(1:1000, simulate_markov_chain)
I am not sure how to run this code in parallel to speed it up.
Here is what I tried - it worked, but I am not sure if this is correct:
num_cores <- detectCores()
cl <- makeCluster(num_cores)
clusterExport(cl, "simulate_markov_chain")
results <- do.call(rbind, parLapply(cl, 1:100000, simulate_markov_chain))
stopCluster(cl)
What is the correct way to run this code in parallel using R?