In multilevel modeling, we have observations nested in grouping variables. For example, the lme4::sleepsludy
dataset has 10 observations each from 18 subjects. For bootstrapping this data for modeling, it makes sense to resample whole subjects. The best workflow for this procedure using rsample, as far as I know, is the following:
library(rsample)
library(tidyverse)
lme4::sleepstudy |>
#resample unique ids
distinct(Subject) |>
bootstraps(times = 10) |>
# attach the original data to the ids
mutate(
analysis = lapply(
splits,
function(x) left_join(analysis(x), lme4::sleepstudy, by = "Subject")
)
)
Note that this copies the original data several times and is wasteful.
I have tried to make a function that does low-level manipulation of the rset object (replacing the data
and in_id
fields) but this feels like cheating.
Is there a better way to use bootstraps() to bootstrap chunks of data where the units being resampled may represent multiple rows of data?