Classification tree

I have a classification tree, and I have a question regarding the order of the variable importance.
For example, I have variable with their importance score:

SDL - 40
LCT - 16
CIS - 10
ML - 9
OCS -8
IC - 3
Hour - 3

Looking to their score, ML is more important than OCS, however when I build the classification only SDL, LCT, CIS, and OCS variables were splitted in the decision tree? How is that ML were not shown in the tree when in fact it obtain a higher score than OCS?


To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

1 Like

As @nirgrahamuk says, a reprex is essential to provide any useful help, especially since there is more than one R package that includes classification tree functionality.

Hello, sorry that I took too long to reply, I'm just new here to this community.

This is my code here using rpart and rpart.plot package. I remove some variables here that is why i got different variable importance.

data<- read.csv("tree2.csv")
tree1 <- rpart(readyb ~ ., data = data, method = "class")
rpart.plot(tree1, digits=3)


And for my classification tree,

I am wondering why ML was not an important factor in my classification tree when it has higher importance score than OCS. Or is there a reason why classification sometimes do not follow the order according to the variable importance score?

what is shown when you do


Does your data have any missing values ?
variable importance can be infuenced by surrogate splits which are relevant when a primary split is missing data (as I understand it)

Hello, I don't have missing values and when I use that code, it was shown like this

And there is still no ML factor, I don't understand fully classification tree since it is my first time analyzing data using this method, that it is why i wanted to know what happened during this process of selecting and splitting important factor.

I'm limited in how proactive I can be in investigating this because its not reproducible. I dont have your data.

I suppose one experiment I might do is omit ML from a run of the script and compare the final objects between scripts to see what is the same / what differs.

Thank you for taking your time helping me out of my problem. I will do as what you suggested then. Thank you again.

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.