Classification tree

Gini · June 8, 2023, 8:25am

I have a classification tree, and I have a question regarding the order of the variable importance.
For example, I have variable with their importance score:

SDL - 40
LCT - 16
CIS - 10
ML - 9
OCS -8
IC - 3
Hour - 3

Looking to their score, ML is more important than OCS, however when I build the classification only SDL, LCT, CIS, and OCS variables were splitted in the decision tree? How is that ML were not shown in the tree when in fact it obtain a higher score than OCS?

nirgrahamuk · June 8, 2023, 8:54am

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

technocrat · June 8, 2023, 9:09am

As @nirgrahamuk says, a reprex is essential to provide any useful help, especially since there is more than one R package that includes classification tree functionality.

Gini · June 8, 2023, 10:20am

Hello, sorry that I took too long to reply, I'm just new here to this community.

This is my code here using rpart and rpart.plot package. I remove some variables here that is why i got different variable importance.

data<- read.csv("tree2.csv")
str(data)
tree1 <- rpart(readyb ~ ., data = data, method = "class")
rpart.plot(tree1, digits=3)
summary(tree1)

Capture

Gini · June 8, 2023, 10:21am

And for my classification tree,

I am wondering why ML was not an important factor in my classification tree when it has higher importance score than OCS. Or is there a reason why classification sometimes do not follow the order according to the variable importance score?

nirgrahamuk · June 8, 2023, 11:53am

what is shown when you do

print(tree1)

Does your data have any missing values ?
variable importance can be infuenced by surrogate splits which are relevant when a primary split is missing data (as I understand it)

Gini · June 8, 2023, 12:15pm

Hello, I don't have missing values and when I use that code, it was shown like this

And there is still no ML factor, I don't understand fully classification tree since it is my first time analyzing data using this method, that it is why i wanted to know what happened during this process of selecting and splitting important factor.

nirgrahamuk · June 8, 2023, 12:21pm

I'm limited in how proactive I can be in investigating this because its not reproducible. I dont have your data.

I suppose one experiment I might do is omit ML from a run of the script and compare the final objects between scripts to see what is the same / what differs.

Gini · June 8, 2023, 12:45pm

Thank you for taking your time helping me out of my problem. I will do as what you suggested then. Thank you again.

system · July 20, 2023, 12:45pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.