You don't need to drop any groups from the model. R will take care of creating dummy values for categorical independent variables. Here's an example, using the built-in iris
data frame:
# Null model (model with intercept only)
m1 = lm(Petal.Width ~ 1, data=iris)
# Model with predictor variables
m2 = lm(Petal.Width ~ Species + Sepal.Length + Sepal.Width, data=iris)
There are various ways to summarize the model. For example,
summary(m2)
Call:
lm(formula = Petal.Width ~ Species + Sepal.Length + Sepal.Width,
data = iris)
Residuals:
Min 1Q Median 3Q Max
-0.50805 -0.10042 -0.01221 0.11416 0.46455
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.86897 0.16985 -5.116 9.73e-07 ***
Speciesversicolor 1.17375 0.06758 17.367 < 2e-16 ***
Speciesvirginica 1.78487 0.07779 22.944 < 2e-16 ***
Sepal.Length 0.06360 0.03395 1.873 0.063 .
Sepal.Width 0.23237 0.05145 4.516 1.29e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1797 on 145 degrees of freedom
Multiple R-squared: 0.9459, Adjusted R-squared: 0.9444
F-statistic: 634.3 on 4 and 145 DF, p-value: < 2.2e-16
summary
provides the coefficient values (in the Estimate
column) and the p-values for the individual predictors (in the Pr(>|t|)
column). Also, note that setosa
is the excluded category for Species
, meaning that you get the model prediction for Species="setosa" by setting the dummies equal to zero for versicolor and virginica.
In addition, the last row of the summary provides all of the F-test information, which is a joint test that all of the coefficents are zero. It gives the F-statistics, the degrees of freedom for the test, and p-value for the test.
Other ways to perform the F-test include:
anova(m1, m2, test="F") # F-test results
# F-test on any combination of independent variables (we include all four here)
car::linearHypothesis(m2, c("Sepal.Length=0",
"Sepal.Width=0",
"Speciesversicolor=0",
"Speciesvirginica=0"))
By "beta weight" do you mean the standardized regression coefficients? If so, you can scale the continuous variables so that the regression coefficients will be in units of standard deviations. For example:
m2s = lm(scale(Petal.Width) ~ Species + scale(Sepal.Length) + scale(Sepal.Width), data=iris)
where the scale
function takes the original data values subtracts the mean and divides by the standard deviation. For example:
x = 0:10
scale(x)
# Compare with
(x - mean(x))/sd(x)