I have a doubt that is somehow offtopic.
I want to do a logit regression with a multicategorical INdependent variable with N categories.
I'm forced to create N-1 dummy variables or can I keep the original multicategorical INdependent variable?
If I have to create the dummies variables is there any code available?
Thank you for your help.
Hi,
Most regression functions in R do that automatically for you. Here is an example
levels(iris$Species)
#> [1] "setosa" "versicolor" "virginica"
glm("Petal.Width ~ .", data = iris)
#>
#> Call: glm(formula = "Petal.Width ~ .", data = iris)
#>
#> Coefficients:
#> (Intercept) Sepal.Length Sepal.Width Petal.Length
#> -0.47314 -0.09293 0.24220 0.24220
#> Speciesversicolor Speciesvirginica
#> 0.64811 1.04637
#>
#> Degrees of Freedom: 149 Total (i.e. Null); 144 Residual
#> Null Deviance: 86.57
#> Residual Deviance: 3.998 AIC: -104.1
Created on 2021-12-09 by the reprex package (v2.0.1)
You can see that for the Species column 2 variables were created (there are 3 possible values).
Hope this helps,
PJ
1 Like
startz
December 9, 2021, 5:27pm
3
You generally want the dummies. The exception is if you believe the "distance" between categories is equal for all categories, that is going from level A to level B would have the same effect as going from B to C.
1 Like
Thank you for you answer @startz
system
Closed
December 16, 2021, 6:16pm
6
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.