[Urget] Desision tree problem, i cant find any solution in the forum

Sorry my english is broken, but i need to solve it for my assignment. I am the newbie for datamining.
I am using rpart to doing this desision tree.

Dataset that i use:

Code that i use:
datatrans16 <- read.csv("C:/Users/yamzh/Desktop/datamining/SBA/datatrans16.csv")
myDataAnalys <- rpart(MISStatus ~ State + RevLineCr + GrAppv , data=datatrans16 ,method = "class",control=rpart.control(cp = 0.005))
rpart.plot(myDataAnalys,extra = 4,digit = -3)

my dicision tree plot:

here the question:
I need to predict who will PayInFull (P I F) and who will CHGOFF (change-off). There are 73.1% PIF and 26.9% of CHGOFF in my cvs. It mean there are 73.1 will pay in full, and 26.9% will change-off. But the plot show that [PIF .269 .731 ] in the first node

  1. [PIF .269 .731 ] .269 in the left mean only 26.9% will pay in full and 73.1% people will charge off?

  2. It is reverse as what i expect?

  3. How can i reverse the decision tree from [PIF .269 .731 ] to [PIF .731 .269] if there is something wrong with my code?

  4. Any one can help me code the desicion tree for my dataset? or how i can improve my code?

Hello there,

Your tree is fine. [PIF .269 .731 ] means that the predicted class is "PIF" and that the probability for "CHGOFF" is 0.269 and for "Not CHGOFF" it's 1 - that, so 0.731. You see that in the nodes in the the lower left corner, where the first number becomes larger than the second one and the predicted class switches from "PIF" to "CHGOFF".


  1. No, it means the opposite; like you expected from your frequency counts.

  2. See 1.

  3. You don't need to.

  4. Your code is fine. Read some more about rpart and decision trees in general to see how to optimise your results.

Hope that helps.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.