I have a classification tree, and I have a question regarding the order of the variable importance.
For example, I have variable with their importance score:
SDL - 40
LCT - 16
CIS - 10
ML - 9
OCS -8
IC - 3
Hour - 3
Looking to their score, ML is more important than OCS, however when I build the classification only SDL, LCT, CIS, and OCS variables were splitted in the decision tree? How is that ML were not shown in the tree when in fact it obtain a higher score than OCS?
To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:
As @nirgrahamuk says, a reprex is essential to provide any useful help, especially since there is more than one R package that includes classification tree functionality.
I am wondering why ML was not an important factor in my classification tree when it has higher importance score than OCS. Or is there a reason why classification sometimes do not follow the order according to the variable importance score?
Does your data have any missing values ?
variable importance can be infuenced by surrogate splits which are relevant when a primary split is missing data (as I understand it)
Hello, I don't have missing values and when I use that code, it was shown like this
And there is still no ML factor, I don't understand fully classification tree since it is my first time analyzing data using this method, that it is why i wanted to know what happened during this process of selecting and splitting important factor.
I'm limited in how proactive I can be in investigating this because its not reproducible. I dont have your data.
I suppose one experiment I might do is omit ML from a run of the script and compare the final objects between scripts to see what is the same / what differs.