Help with smote and cross validation.

I I'm trying to do cross validation for the logistics function.
I used smote on the training set and and left the test test obtained by initial split.
When I get to make the final estimate of the error, what argument do I put in last_fit? I cannot put the initial splits because otherwise consider the initial training set without smote. I attach the code below:

#initial split

pok_split=initial_split(pokemon,prop=3/4)
pok_train=training(pok_split)
pok_test=testing(pok_split)
pok_train_fold=vfold_cv(pok_train,v=5)

#train with smote
#train smote e fold

pok_smote_train=smote(be_legendary~.,pok_train,perc.over=2,perc.under=5.65) #2 e #5.65 
frequenze_smote_train=count(pok_smote_train,be_legendary)
pok_test
pok_smote_fold=vfold_cv(pok_smote_train,v=5)

#logistic function with cross validation

logistica_recipe1=recipe(be_legendary~. ,data=pok_smote_train)
specifica_logistica1=logistic_reg()%>%set_engine("glm")    

work_logistica1=workflow()%>%
  add_model(specifica_logistica1)%>%
  add_recipe(logistica_recipe1)

logistica_fit1=work_logistica1%>%fit_resamples(resamples=pok_smote_fold,control=control_resamples(save_pred = TRUE))

metriche_log_smote= collect_metrics(logistica_fit1,metric="accuracy")

si_logistica1= collect_predictions(logistica_fit1)

#########
migliore_metrica_log_smote= select_best(logistica_fit1,metric="accuracy")

work_finale_logistica1= work_logistica1%>%finalize_workflow(migliore_metrica_log_smote)

log_smote_fit_finale= work_finale_logistica1%>%last_fit(pok_split)

metriche_finali_log_smote= log_smote_fit_finale%>%collect_metrics()

predizioni_finali=collect_predictions(log_smote_fit_finale)

matrice_confusione_log_smote=table(predizioni_finali$.pred_class,pok_test$be_legendary)

I think using the Themis packages's step_smote() function would fix this. I would replace the recipe you have with:

logistica_recipe1 = recipe(be_legendary~. ,data=pok_train) %>% step_smote(be_legendary)

Now when you call fit_resamples() or last_fit() the model will always perform the smote algorithm on the data.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.