I would like an assistance for the following matter:
Say I have the following features where MainCat is the main category, SubCat1 is a subcategory of MainCat and SubCat2 is subcategory of SubCat1 (i.e. there is nested / hierarchical nature in the predictors).
MainCat | SubCat1 | SubCat2 | y (outcome) |
---|---|---|---|
A | A1 | A11 | 2 |
A | A1 | A12 | 1 |
A | A2 | A21 | 3 |
A | A2 | A22 | 3 |
B | B1 | B11 | 9 |
B | B1 | B12 | 17 |
B | B2 | B21 | 3 |
B | B2 | B22 | 35 |
How should i go about using these three predictors to estimate the value of y?
Using them independently does not seem right as this hierarchy between them will not be taken into consideration. Are there model which can handle this type of relationship out of the box (e.g. tree based models?)
Another option would use only the third column which has the more fine information but wouldnt this miss the hierarchical nature of such a relationship?
A third option would be to encode them using a hierarchical model such as the one describe in the TMWR book 17 Encoding Categorical Data | Tidy Modeling with R . I am not sure if such kind of models can handle two or three deep hierarchical structures or if the recipe step available can do so.
Any help would be much appreciated, thanks!