Analyzing importance of continuous and categorical variables in linear regression in r

blueblueblue · July 22, 2023, 5:52pm

Hi! I have a data set with a binary (0,1) response and both continuous and categorical predictors. I would like to test the overall importance of these predictors one by one, and I am looking for suggestions on how to do this. I am not trying to select an overall model. I am just trying to see which variables are individually significantly associated to the response.

Here is an example concerning categorical variables... Let's call the response "y". Let's say my categorical predictor is called "xCat" and has levels A, B, C, and D. I would like to test if xCat has a statistically significant association with y. I want to test for overall significance, not just significant differences from a single reference group. Here is what I have tried so far...

Option A using LRT:

fitCat <- glm(y~xCat, family = binomial(link = "logit"), data = data)
fit0 <- glm(y~1, family = binomial(link = "logit"), data = data)
anova(fit0, fitCat, test="LRT")

Option B using drop1:

fitCat <- glm(y~xCat, family = binomial(link = "logit"), data = data)
drop1(fitCat, .~., test = "Chisq")

I would then look at p-values for either of these outputs. If the p-value is >0.05, then I would say there is not significant association. Is this actually testing what I think it is testing? I am concerned about violating assumptions of normality and equal variance for the ANOVA. Any comments or suggestions?

Here is an example of the continuous predictors... Let's call the continuous predictor "xCon". Here is what I've tried...

fitCon <- glm(y~xCon, family = binomial(link = "logit"), data = data)
summary(fitCon)

I would then look at the p-value from the output. If the p-value is >0.05, then I would say there is not significant association. Is there anything I'm missing here? assumptions I need to check or common pitfalls? Let me know if you have suggestions! Thanks!

startz · July 22, 2023, 6:37pm

I believe that fit0$deviance-fitCat$deviance is distributed \chi^2(3). See Logistic.

blueblueblue · July 23, 2023, 12:34am

Could you elaborate on what you mean here.

startz · July 23, 2023, 2:01am

The formula given is a statistic for whether xCat is statistically significant (assume a large sample). The statistic is distributed \chi^2(3), from which you can find a p-value.

system · September 3, 2023, 2:01am

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.