I noticed this interesting behavior and wasn't sure whether I was solving it in the right way, or whether it is indeed a feature:
When I modify the contrasts of a factor column in a tibble, the contrasts attribute seems to disappear (or otherwise be reset) if I nest and then unnest the data.
First, the contrasts behave normally without any nesting:
library(tidyverse)
library(magrittr) #for the double sided pipe
mydata <- crossing(group = 1:10, x = rep(c("a", "b"), 20)) %>%
mutate(y = if_else(x == "a", rnorm(n(), 1, 1), rnorm(n(), 0, 1)))
mydata %<>%
mutate(
x = factor(x),
x = structure(x, contrasts = c(0.5, -0.5))
)
contrasts(mydata$x)
[1] 0.5 -0.5
Then, if I nest the data, the contrasts are preserved in (at least the first) nested sub-dataframe.
# Initialize the data as in the first chunk
mydata %<>%
mutate(
x = factor(x),
x = structure(x, contrasts = c(0.5, -0.5))
) %>%
nest(data = -group)
contrasts(mydata$data[[1]]$x)
[1] 0.5 -0.5
But if I go on to then unnest the data, the contrasts appear to have returned to their default setting.
# Initialize the data as in the first chunk
mydata %<>%
mutate(
x = factor(x),
x = structure(x, contrasts = c(0.5, -0.5))
) %>%
nest(data = -group) %>%
unnest(data)
contrasts(mydata$x)
b
a 0
b 1
So it seems only to happen with unnesting, not with nesting.
I am bootstrap-resampling subgroups of data before modeling, so I need to nest and then unnest my data before fitting regressions with lm()
. I've managed to get around the contrast dropping in my own work by just never setting the contrasts on any factor variables until after I call unnest()
, but I don't know if that's the smoothest way to get around this.
- Is there a different way I can be setting column contrasts so that they will persist through unnesting?
- Is this intended or unintended behavior of
unnest()
?
PS: I am not sure if using the recipes
package to preprocess my predictor variables before running my regressions would solve this in a cleaner way. I would like to transition some of my code to use the tidymodels framework in the future, but I haven't gotten around to it yet