I am currently working on doing a linear regression. I am scaling the X variables to see which coefficients "matter" the most in regards to impacting the Y variable. For some reason, when I put in all the variables into the regression (there are many competitors' sales variables), a lot of the coefficients return as NA
. There is data in all the columns though. Any idea why this is? There are quite a lot of X variables in the regression. Any tips would be appreciated. I'm wondering if it is defensible to throw out variables or if that would bias the regression? Maybe tossing very correlated variables (multicollinearity)?
This returns a lot of coefficients as NA:
lm(formula = scale(CD_Sales) ~
scale(consumer_spending) +
scale(quantity_of_reviews) +
scale(population) +
scale(income) +
scale(competitor1) +
scale(competitor2) +
scale(competitor3) +
scale(ompetitor4) +
scale(competitor5) +
scale(city_affordability) +
scale(city_safety_rating) +
scale(price_difference) +
scale(sales_offered_or_not) +
scale(competitor6) +
scale(competitor7),
data = data,
na.action=na.omit)
The follow returns maybe one coefficient as NA:
lm(formula = scale(CD_Sales) ~
scale(consumer_spending) +
scale(quantity_of_reviews) +
scale(population) +
scale(income) +
scale(city_affordability) +
scale(city_safety_rating) +
scale(price_difference) +
scale(sales_offered_or_not) +
scale(competitor6),
data = data,
na.action=na.omit)
Thank you!