decision tree using rpart

randomforest

#1
excluded_variables <- c("target")

tree1 <- rpart(target ~ .,  
 data = dataset[, !(names(dataset) %in% excluded_variables)]
  1. how do you interpret the code in the "data" argument"
  2. what does the ! and % mean?

#2
  • ! is a negation operator
!TRUE
#> [1] FALSE
!FALSE
#> [1] TRUE
(x <- c(TRUE, FALSE))
#> [1]  TRUE FALSE
!x
#> [1] FALSE  TRUE
  • %in% is an operator to check existence among a vector. It returns a logical value. If you use with ! you get the negation of the result. If you use a vector before %in% it will test each element
x <- c("A", "B")
"A" %in% x
#> [1] TRUE
"C" %in% x
#> [1] FALSE
!("B" %in% x)
#> [1] FALSE
c("A", "C") %in% x
#> [1]  TRUE FALSE
  • names(dataset) %in% excluded_variables gets you a vector of logical, with TRUE for all the names of dataset that are in excluded_variables.
    Using ! before, you get TRUE, for all the names that are NOT in excluded_variables. This logical vector is used to select all the column of dataset except the ones in excluded_variables

Hopes it helps you understand your code.


#3

thank you for the great response!


#4

You're welcome. I think this is great questions when about getting better understanding of what we are doing.

Is it ok now ?

If your question's been answered, would you mind marking as answered? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it: