foreach for random forest

Dhahi · July 8, 2019, 11:36am

Does anyone knows why this is producing an error like this (Error in { :
task 1 failed - "arguments imply differing number of rows: 1, 0")?
I tried to run a loop through large data to predict one unique row from the dataset using ranger package. Is the problem because rows don't have a similar length or something else?
Thanks

library(ranger)
name = as.vector(unique(my_data$ID))

pa = my_data[-(which(names(my_data)=="ID"))]
test1=NULL
pre= NULL


foreach(p = 1:length(name), combine = "rbind") %dopar% {
  
  train = subset(pa, my_data$ID != p)
  test = subset(pa, my_data$ID == p)
  
  rf.fit  = ranger(rain ~.,data=train, num.trees=500, importance = "permutation", write.forest = TRUE)
  rf.pred = predict(rf.fit,test)
  
  test1= rbind(test1,test)
  pre = c(pre,rf.pred$predictions)
  cat(p,"\n")
}

pieterjanvc · July 10, 2019, 4:32pm

Using the foreach makes debugging more difficult. Try replacing the foreach with a simple for-loop instead and see at which iteration the error occurs (by printing the iteration at the start of each for-loop).

Then execute the inside of the for-loop line by line for that iteration, and see where it generated the error. This error is usually thrown when creating data frames goes wrong and some variables have x results and others have y, not matching up when combined...

esuess · July 11, 2019, 2:06pm

The suggestion to use a for loop, rather than using foreach() is a good suggestion. The ranger() function is already parallelized implementation of RF, so trying to run it in parallel is probably not going to speed things up unless you are running on a cluster of multicore machines.

Fer · July 11, 2019, 3:14pm

You don't need to print anything, as if the loop brokes, you'll get the offending iterator from 'i', as it will have the last value before the loop goes wrong.

system · July 18, 2019, 3:14pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.