apply random forest classification to a list of dataframes with individual mtry parameter for each dataframe

Hello together,

"train data" is a list of 100 dataframes. I would like to apply a separate instance of Random Forest to each dataframe, resulting in 100 individual models (RF_models). My problem is the parameter "mtry".
Instead of using one value for mtry for all dataframes collectively, I have prepared a vector for mtry with 100 specific values (optimal tuned value for each dataframe) and I want the script to use the corresponding value for each of the dataframes from this vector. In this case "corresponding" means, the first value of the vector shall be used for the first dataframe in the list, the second for the second, etc.

My code apparently isn't complete, because it always just uses the first value of the vector for all dataframes. I suspect I'll have to use an index for "mtry" with an additional variable included in the function. But, alas, no cigar.

RF_models <- lapply(train_data, function(i) 
{randomForest(i[-25],  i$classes, mtry=models_mtry, ntree=500, sampsize=smp.size, strata=i$classes) 
})

Thanks in advance for your support.

Read about how to use purrr package I.e. map functions

Then you will be able to use map2 to solve this sort of requirement

Thank you.

I tried this after reading about map, but seems I have still trouble with this, cause it doesn't work. :frowning:

RF_models <- Map(function(i) {
  randomForest(train_data[-25], train_data$classes, mtry=models_mtry[i], ntree=500, sampsize=smp.size, strata=train_data$classes)   
})

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.