Originally posted on stackoverflow. Trying my luck here with a few modifications.
I want to fit a random forest on this data where y = "happy" after x = "ate". Some of these people were lucky and got two free meals, while some only got one. Could I use rsample to make sure that the same id (in this case 2) does not appear in both the train and test split? If not, how should I do it?
library(tibble) library(rsample) set.seed(123) dframe <- tibble(id = c(1,1,2,2,3,4,5,5,6,7), ate = sample(c("cookie", "slug"), size = 10, replace = TRUE), happy = sample(c("yes", "no"), size = 10, replace = TRUE)) dframe_split <- initial_split(dframe, prop = 3/4, strata = "ate") dframe_train <- training(dframe_split) dframe_test <- testing(dframe_split)
Created on 2018-10-14 by the reprex package (v0.2.0).