Cluster stops behaving properly after rerunning the same function on it for a number of times

Hi all,

I am running into a strange problem. So I create a cluster called dl, and use parLapply to calculate a list and re combine it into a vector.

image

This is the output. You can see that after 4 iterations, it did not manage to apply parLapply, because something went wrong on one of the cores. The other inputs into the functions are the four nu's, and the function can be calculated with the last set of 4 nu's. So the problem is not that the inputs after 4 iterations became such that the function can't be calculated anymore.

It does this at completely random number of iterations. At some point, it is just not able to calculate it anymore because the cluster stops behaving the way it should be. I am at a loss of ideas of why it is happening. Any ideas on how to debug this?

Thanks

have you ran the calculation serially ?

What does that mean?

the way the code works is:

nu <- c(0,0,0,0)
delta <- parLapply(cluster, nu)
new_nu <- someothernonparallelfunction(nu, delta)

start again

I mean to run your code, not in parallel, as running it in parallel makes it harder to debug.
if you first ran it normally you would have a better change of understanding the issue

Yes, it works fine if I run it with lapply. It also works fine always when I just start the server with parLapply.

I dont know what it means to 'start the server with parLapply' ; and what do you contrast that to ?

To solidify this conversation; it would help to have a reprex.

Yes, I will try to get a MWE going.

If I start the cluster fresh, and then use parlapply, it always works. Only after a few iterations it does not. My suspicion is the following.

  1. I parallelize over subsets of the data. In order to do this, I export the entire dataset to each worker, then subset it, then do the calculation.
  2. I think it dies because of memory.

Do you know, how I can parallelize such that I only export the subset relevant to the worker? Is there an example somewhere?

I would guess its this, but I'm not super confident

library(parallel)
cl <- makeCluster(getOption("cl.cores", 2))

parLapply(cl=cl,
          X = split(iris,
                    ~Species),
          fun = \(x)mean(x$Petal.Length))

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.