In multilevel modeling, we have observations nested in grouping variables. For example, the
lme4::sleepsludy dataset has 10 observations each from 18 subjects. For bootstrapping this data for modeling, it makes sense to resample whole subjects. The best workflow for this procedure using rsample, as far as I know, is the following:
library(rsample) library(tidyverse) lme4::sleepstudy |> #resample unique ids distinct(Subject) |> bootstraps(times = 10) |> # attach the original data to the ids mutate( analysis = lapply( splits, function(x) left_join(analysis(x), lme4::sleepstudy, by = "Subject") ) )
Note that this copies the original data several times and is wasteful.
I have tried to make a function that does low-level manipulation of the rset object (replacing the
in_id fields) but this feels like cheating.
Is there a better way to use bootstraps() to bootstrap chunks of data where the units being resampled may represent multiple rows of data?