I am sorry, but I have already tried my best to realize the caret package document.

From the document, I have below problems :

First,

On the page12, it gave an example to explain the variable importance. It mentioned that the agreement is 126/146 = 0.863 and the adjusted agreement is (126-85)/(146-85).

In addition, it said that "An overall measure of variable importance is the sum of the goodness of split measures for each split for which it was the primary variable, plus goodness * (adjusted agreement) for all splits in which it was a surrogate."

Question1: Where do the "126" and "85" come from?

Question2: What is the goodness of split measure? How can I calculate it?

Second,

On the page24, it gave the formula to calculate cp. It mentioned that R is the risk. It said that we can see cp as the difference between R-squared in the regression tree.

Question3: What is the risk?

Question4: If my tree is classification trees, how can I explain the cp? (I remembered that logistic regression does not have R-squared)

Question5: Please look at the below example1, I found that the cp of node 1 is (1-0.6851852)/(3-0)=0.10493827 and of node2 is (0.6851852-0.6296296)/(4-3)=0.05555556.

But why the below example2 showed the cp of node1 is 0.009070295 instead of (1-0.7823129)/(20-0)=0.01088436?

Question6:

If my data does not have any missing data, I don't need to use the surrogate variables?

If the answer is true, why did the R still print the surrogate variable and other information about it?

Actually, my purpose is to calculate the cp, improve, (adjusted) agreement and variable importance by myself instead of computer.

Sorry, I have so many problems. I really want to know the rpart package more.

I hope you all can help me, thank you very much!!