How do I standardize X in glmnet and cv.glmnet?

Jasnclin · November 9, 2021, 5:25am

Hi everyone,
I couldn't find the answer and got so confused by standardization in glmnet...

I have 500 variables (chemicals), and each of them has 3 estimated levels, which means I actually have 1500 variables (X) in a dataset. Now I want to rule out the chemicals that do not play an important role on the outcome (Y), so I'm using glmnet to select them.

I'm training my data and trying to apply glmnet as follows:

a <- seq(0.1,0.9,0.05)
 search <- foreach(i=a, .combine = rbind, .packages = 'glmnet') %dopar%{
 cv <- cv.glmnet(mdlX, mdlY, family = 'binomial', nfold = 10, type.measure = 'auc', parallel = TRUE, alpha = i, standardize = TRUE)
 data.frame(cvm = cv$cvm[cv$lambda == cv$lambda.min], lambda.min = cv$lambda.min, alpha = i)}
 cv3 <- search[search$cvm == min(search$cvm),]
 md3 <- glmnet(mdlX, mdlY, family = 'binomial', alpha = cv3$alpha, lambda = cv3$lambda.min, standardize = TRUE)

I read that the default is standardize = TRUE if family = 'gaussian' , so I added it to my codes. But then, it was indicated that the coefficients would be returned on the original scale.

So my question is:

Should I still do Scale(X) in addition to standardize = TRUE in both cv.glmnet and glmnet if the variables (chemicals) have different units? Because in the end, I'm using it to select the variables I need by the Variable Inclusion Probability:

Result <- apply(coeff_df,2,function(x) {sum(x!=0)/5*100})

But I also read from another post that I do not need to standardize it beforehand, if I use predict()
I'm not sure which is the best option.

Thank you for reading!

system · November 30, 2021, 5:25am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.