Sorry my english is broken, but i need to solve it for my assignment. I am the newbie for datamining.
I am using rpart to doing this desision tree.
Dataset that i use:
https://drive.google.com/open?id=1qry-4PBmiOH56y9LTGMbeCFkiHj_tmUp
Code that i use:
datatrans16 <- read.csv("C:/Users/yamzh/Desktop/datamining/SBA/datatrans16.csv")
library("rpart.plot")
myDataAnalys <- rpart(MISStatus ~ State + RevLineCr + GrAppv , data=datatrans16 ,method = "class",control=rpart.control(cp = 0.005))
rpart.plot(myDataAnalys,extra = 4,digit = -3)
my dicision tree plot:
https://drive.google.com/open?id=1-CX-TYOXoWHZta5v8gdNDsSk1ofPzyN1
here the question:
I need to predict who will PayInFull (P I F) and who will CHGOFF (change-off). There are 73.1% PIF and 26.9% of CHGOFF in my cvs. It mean there are 73.1 will pay in full, and 26.9% will change-off. But the plot show that [PIF .269 .731 ] in the first node
-
[PIF .269 .731 ] .269 in the left mean only 26.9% will pay in full and 73.1% people will charge off?
-
It is reverse as what i expect?
-
How can i reverse the decision tree from [PIF .269 .731 ] to [PIF .731 .269] if there is something wrong with my code?
-
Any one can help me code the desicion tree for my dataset? or how i can improve my code?