Which machine learning method should choose to predict binary outcome based on several binary predictors?

Liang_Z · December 4, 2022, 9:20am

The data has a outcome varaible (healthy or cancer) and several binary predictors (yes or no). I tried logistic regression, SVM, KNN, xgboost, lightGBM, random forest algorithms, and found that the best model was logistic regression. The AUC and accuracy index were close to that of logistic regression when using xgboost and lightGBM even though I tuned the parameters. So which machine learning method should choose to predict binary outcome based on several binary predictors?

Is it suitable for using SVM, KNN, xgboost, lightGBM, random forest algorithms in this case? Or logistic regression is the only method?

AC3112 · December 5, 2022, 12:18pm

Hi Liang_Z.

Given you have several binary predictors, you may wish to try something like a logistic LASSO?

Liang_Z · December 5, 2022, 12:26pm

I have tried lasso using tidymodels:

lasso_model <- 
  logistic_reg(mode = "classification",
               penalty = tune(), 
               mixture = 1,
               engine = "glmnet"
  )

lasso_wf <-
  workflow() %>%
  add_model(lasso_model) %>% 
  add_recipe(model_recipe)
lasso_wf

set.seed(123)
lasso_results <-
  lasso_wf %>% 
  tune_grid(resamples = dat_cv,
            control = control_grid(save_pred = TRUE),
            grid = tibble(penalty = 10 ^ seq(-5, 0, length.out = 50)),
            metrics = metric_set(accuracy,roc_auc)
  )

The roc_aucand accuracy values equal to those obtained from logistic regression.
I mean if other machine learning methods are feasible for this kind of data?

AC3112 · December 5, 2022, 12:33pm

Hi @Liang_Z.

Fair point. I mean, it seems you've pretty much exhausted the options if your primary concerns are AUC.

I went through a similar exploration journey and found very little discrepancy in AUCs. But I ended up settling on something like an optimal decision tree for good visualisation/prediction. I.e. a more white box method.

https://docs.interpretable.ai/dev/IAI-R/

Liang_Z · December 5, 2022, 1:29pm

Thanks for your reply. @AC3112

The results calculated by several ML methods are listed:

algorithm	auc	accuracy	f_meas	precision	recall
lasso	0.880	0.846	0.789	0.789	0.789
knn	0.825	0.808	0.722	0.765	0.684
svm	0.835	0.827	0.743	0.812	0.684
random forest	0.864	0.827	0.743	0.812	0.684
naive bayes	0.875	0.865	0.788	0.929	0.684
decision trees	0.860	0.827	0.743	0.812	0.684
bag trees	0.883	0.846	0.789	0.789	0.789
mlp	0.880	0.846	0.789	0.789	0.789
xgboost	0.875	0.827	0.769	0.75	0.789
lightgbm	0.870	0.846	0.789	0.789	0.789

It seems that the indices of lasso, bag trees, mlp, and lightgbm are similar. I don't know which one I should choose for the final ML model.

system · December 26, 2022, 1:30pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.