Hello, first I would like to apologize if I am asking this in the wrong category or if this question has been posted before. I'm a college student and we need to do our econometrics term paper using R. However, our professor never taught us how we can run a regression for categorical variables...
The trouble I am having is in order to avoid perfect multicollinearity you need to n-1 variables. How would I do this? For example, I have 5 categories and I want R to only include 4 in the regression and use the excluded one as the base group. This is how my data is set up
I am analyzing the impact of the height of NBA players on their salary while controlling for position. I want shooting guard (SG) to be my reference group.
So my regression formula is:
Reg2 <- lm(SALARY~ Height+PG+PF+SF+Center)
summary(Reg2)
Call:
lm(formula = SALARY ~ Height + PG + PF + SF + Center)
Residuals:
Min 1Q Median 3Q Max
-4845982 -2412751 -346673 2224250 7143761
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 82417024 27550269 2.992 0.00514 **
Height -767197 351353 -2.184 0.03599 *
PG 6282111 2304350 2.726 0.01005 *
PF 5353895 2482486 2.157 0.03819 *
SF 5428714 2335014 2.325 0.02618 *
Center 6404938 3096554 2.068 0.04628 *