I want to train a regression model by random forest. I have defined a search grid by H2O. But I got this error:
Error: Illegal argument: training_frame of function: grid: Cannot append new models to a grid with different training input.
I need to mention I get this error in "build grid search " section
This is my code:
set.seed(200)
dataset1 = a #dataframe
# Splitting data into Train and Test
dataset = dataset1[, 2:length(dataset1)]
ind <- sample(2, nrow(dataset), replace=TRUE, prob=c(0.70, 0.30))
train_data <- dataset[ind==1, 1:ncol(dataset)]
test_data <- dataset[ind==2, 1:ncol(dataset)]
h2o.no_progress()
h2o.init(max_mem_size = "5g")
# create feature names
y <- "Expression"
x <- setdiff(names(train_data), y)
# turn training set into h2o object
train.h2o <- as.h2o(train_data)
# hyperparameter grid
hyper_grid.h2o <- list(
ntrees = seq(200, 600, by = 100),
mtries = seq(10,30, by = 2),
sample_rate = c(.55, .632, .70, .80)
)
# random grid search criteria
search_criteria <- list(
strategy = "RandomDiscrete",
stopping_metric = "mse",
stopping_tolerance = 0.005,
stopping_rounds = 10,
max_runtime_secs = 30*60
)
# build grid search
random_grid <- h2o.grid(
algorithm = "randomForest",
grid_id = "rf_grid2",
x = x,
y = y,
training_frame = train.h2o,
hyper_params = hyper_grid.h2o,
search_criteria = search_criteria
)
# collect the results and sort by our model performance metric of choice
grid_perf2 <- h2o.getGrid(
grid_id = "rf_grid2",
sort_by = "mse",
decreasing = FALSE
)
print(grid_perf2)
#grab the best model
best_model_id <- grid_perf2@model_ids[[1]]
best_model <- h2o.getModel(best_model_id)
# evaluate the model performance on a test set
test_data.h2o <- as.h2o(test_data)
best_model_perf <- h2o.performance(model = best_model, newdata = test_data.h2o)
# RMSE of best model
h2o.mse(best_model_perf) %>% sqrt()
#predict
pred_h2o <- predict(best_model, test_data.h2o)
head(pred_h2o)
How can I fix this error?
Thanks in advance.