Hi everyone,

I couldn't find the answer and got so confused by standardization in glmnet...

I have 500 variables (chemicals), and each of them has 3 estimated levels, which means I actually have 1500 variables (X) in a dataset. Now I want to rule out the chemicals that do not play an important role on the outcome (Y), so I'm using glmnet to select them.

I'm training my data and trying to apply glmnet as follows:

```
a <- seq(0.1,0.9,0.05)
search <- foreach(i=a, .combine = rbind, .packages = 'glmnet') %dopar%{
cv <- cv.glmnet(mdlX, mdlY, family = 'binomial', nfold = 10, type.measure = 'auc', parallel = TRUE, alpha = i, standardize = TRUE)
data.frame(cvm = cv$cvm[cv$lambda == cv$lambda.min], lambda.min = cv$lambda.min, alpha = i)}
cv3 <- search[search$cvm == min(search$cvm),]
md3 <- glmnet(mdlX, mdlY, family = 'binomial', alpha = cv3$alpha, lambda = cv3$lambda.min, standardize = TRUE)
```

I read that the default is `standardize = TRUE`

if `family = 'gaussian'`

, so I added it to my codes. But then, it was indicated that the coefficients would be returned on the original scale.

So my question is:

Should I still do `Scale(X)`

in addition to `standardize = TRUE`

in both cv.glmnet and glmnet if the variables (chemicals) have different units? Because in the end, I'm using it to select the variables I need by the Variable Inclusion Probability:

```
Result <- apply(coeff_df,2,function(x) {sum(x!=0)/5*100})
```

But I also read from another post that I do not need to standardize it beforehand, if I use `predict()`

I'm not sure which is the best option.

Thank you for reading!