Originally posted on stackoverflow. Trying my luck here with a few modifications.
I want to fit a random forest on this data where y = "happy" after x = "ate". Some of these people were lucky and got two free meals, while some only got one. Could I use rsample to make sure that the same id (in this case 2) does not appear in both the train and test split? If not, how should I do it?
library(tibble)
library(rsample)
set.seed(123)
dframe <- tibble(id = c(1,1,2,2,3,4,5,5,6,7),
ate = sample(c("cookie", "slug"), size = 10, replace = TRUE),
happy = sample(c("yes", "no"), size = 10, replace = TRUE))
dframe_split <- initial_split(dframe, prop = 3/4, strata = "ate")
dframe_train <- training(dframe_split)
dframe_test <- testing(dframe_split)
Created on 2018-10-14 by the reprex package (v0.2.0).