unnest() drops factor contrasts from tibble columns

I noticed this interesting behavior and wasn't sure whether I was solving it in the right way, or whether it is indeed a feature:

When I modify the contrasts of a factor column in a tibble, the contrasts attribute seems to disappear (or otherwise be reset) if I nest and then unnest the data.

First, the contrasts behave normally without any nesting:

library(tidyverse)
library(magrittr) #for the double sided pipe

mydata <- crossing(group = 1:10, x = rep(c("a", "b"), 20)) %>%
  mutate(y = if_else(x == "a", rnorm(n(), 1, 1), rnorm(n(), 0, 1)))

mydata %<>%
  mutate(
    x = factor(x),
    x = structure(x, contrasts = c(0.5, -0.5))
  )

contrasts(mydata$x)

[1]  0.5 -0.5

Then, if I nest the data, the contrasts are preserved in (at least the first) nested sub-dataframe.

# Initialize the data as in the first chunk
mydata %<>%
  mutate(
    x = factor(x),
    x = structure(x, contrasts = c(0.5, -0.5))
  ) %>%
  nest(data = -group)

contrasts(mydata$data[[1]]$x)

[1]  0.5 -0.5

But if I go on to then unnest the data, the contrasts appear to have returned to their default setting.

# Initialize the data as in the first chunk
mydata %<>%
  mutate(
    x = factor(x),
    x = structure(x, contrasts = c(0.5, -0.5))
  ) %>%
  nest(data = -group) %>%
  unnest(data)

contrasts(mydata$x)

  b
a 0
b 1

So it seems only to happen with unnesting, not with nesting.

I am bootstrap-resampling subgroups of data before modeling, so I need to nest and then unnest my data before fitting regressions with lm(). I've managed to get around the contrast dropping in my own work by just never setting the contrasts on any factor variables until after I call unnest(), but I don't know if that's the smoothest way to get around this.

  1. Is there a different way I can be setting column contrasts so that they will persist through unnesting?
  2. Is this intended or unintended behavior of unnest()?

PS: I am not sure if using the recipes package to preprocess my predictor variables before running my regressions would solve this in a cleaner way. I would like to transition some of my code to use the tidymodels framework in the future, but I haven't gotten around to it yet :disappointed:

I'm unsure of the question. Different arguments to contrasts do, and should, produce different values. What am I missing?

suppressPackageStartupMessages({
  library(dplyr)
  library(magrittr)
  library(tidyr)
})

mydata <- crossing(group = 1:10, x = rep(c("a", "b"), 20)) %>%
  mutate(y = if_else(x == "a", rnorm(n(), 1, 1), rnorm(n(), 0, 1)))

str(mydata)
#> tibble [20 × 3] (S3: tbl_df/tbl/data.frame)
#>  $ group: int [1:20] 1 1 2 2 3 3 4 4 5 5 ...
#>  $ x    : chr [1:20] "a" "b" "a" "b" ...
#>  $ y    : num [1:20] 1.824 -0.536 2.984 1.516 1.515 ...

mydata %<>%
  mutate(
    x = factor(x),
    x = structure(x, contrasts = c(0.5, -0.5))
  )

str(mydata)
#> tibble [20 × 3] (S3: tbl_df/tbl/data.frame)
#>  $ group: int [1:20] 1 1 2 2 3 3 4 4 5 5 ...
#>  $ x    : Factor w/ 2 levels "a","b": 1 2 1 2 1 2 1 2 1 2 ...
#>   ..- attr(*, "contrasts")= num [1:2] 0.5 -0.5
#>  $ y    : num [1:20] 1.824 -0.536 2.984 1.516 1.515 ...

md1 <- mydata

str(contrasts(mydata$x))
#>  num [1:2] 0.5 -0.5


# Initialize the data as in the first chunk

mydata %<>%
  mutate(
    x = factor(x),
    x = structure(x, contrasts = c(0.5, -0.5))
  ) %>%
  nest(data = -group)

str(mydata$data[[1]]$x)
#>  Factor w/ 2 levels "a","b": 1 2
#>  - attr(*, "contrasts")= num [1:2] 0.5 -0.5

md2 <- mydata

str(contrasts(mydata$data[[1]]$x))
#>  num [1:2] 0.5 -0.5

mydata
#> # A tibble: 10 x 2
#>    group data            
#>    <int> <list>          
#>  1     1 <tibble [2 × 2]>
#>  2     2 <tibble [2 × 2]>
#>  3     3 <tibble [2 × 2]>
#>  4     4 <tibble [2 × 2]>
#>  5     5 <tibble [2 × 2]>
#>  6     6 <tibble [2 × 2]>
#>  7     7 <tibble [2 × 2]>
#>  8     8 <tibble [2 × 2]>
#>  9     9 <tibble [2 × 2]>
#> 10    10 <tibble [2 × 2]>

# retrace with unnested

mydata <- crossing(group = 1:10, x = rep(c("a", "b"), 20)) %>%
  mutate(y = if_else(x == "a", rnorm(n(), 1, 1), rnorm(n(), 0, 1)))

mydata %<>%
  mutate(
    x = factor(x),
    x = structure(x, contrasts = c(0.5, -0.5))
  ) %>%
  nest(data = -group) %>%
  unnest(data)

str(mydata)
#> tibble [20 × 3] (S3: tbl_df/tbl/data.frame)
#>  $ group: int [1:20] 1 1 2 2 3 3 4 4 5 5 ...
#>  $ x    : Factor w/ 2 levels "a","b": 1 2 1 2 1 2 1 2 1 2 ...
#>  $ y    : num [1:20] -0.0616 0.5433 1.3568 -0.9118 2.1598 ...

md3 <- mydata

contrasts(mydata$x)
#>   b
#> a 0
#> b 1

# differences in the arguments to `contrasts`

md1$x
#>  [1] a b a b a b a b a b a b a b a b a b a b
#> attr(,"contrasts")
#> [1]  0.5 -0.5
#> Levels: a b
md2$x
#> Warning: Unknown or uninitialised column: `x`.
#> NULL
md3$x
#>  [1] a b a b a b a b a b a b a b a b a b a b
#> Levels: a b

contrasts(md1$x)
#> [1]  0.5 -0.5
contrasts(md2$x)
#> Warning: Unknown or uninitialised column: `x`.
#> Error in contrasts(md2$x): contrasts apply only to factors
contrasts(md3$x)
#>   b
#> a 0
#> b 1

Thank you for taking some time on this! maybe I can clarify?

In your working of my example, my confusion is coming from md3. The contrasts have been set as with md1 and md2, but as in your working md3 appears to have reverted to default contrasts, while md1 still has them set (and md2 also has them set, but inside of the nested sub-dataframes).

md1 and md3 should be totally identical but they aren't in that md3 has dropped the contrasts attribute from the x column. That's the part I was getting that I found to be unexpected.

(md2 is to show that I believe that unnesting in particular is the function that's dropping contrasts. The column x in the nested sub-dataframes still has non-default contrasts set.)

unnest does drop attributes.
If you want to keep the info around, I think you need to think of a way to store it, and then reapply it intelligenty.

a sketch

library(tidyverse)
library(magrittr) #for the double sided pipe

mydata <- crossing(group = 1:10, x = rep(c("a", "b"), 20)) %>%
  mutate(y = if_else(x == "a", rnorm(n(), 1, 1), rnorm(n(), 0, 1)))

mydata %<>%
  mutate(
    x = factor(x),
    x = structure(x, contrasts = c(0.5, -0.5))
  ) %>%
  nest(data = -group)

mydata$contraststored <- map(seq_len(nrow(mydata)),
    ~slice(mydata,.)$data[[1]]$x %>% contrasts())

(m2 <- unnest(mydata,data))
m2$contraststored
1 Like

Ahhh gotcha, thanks for confirming that unnest() does indeed drop attributes, and for the suggestion on extracting the contrasts in the nested version of the dataframe to then have later when the data are unnested. I may use this when it fits, and other times if I can help it I may just modify the contrasts attribute of the column after the data has already been unnested to avoid the attribute dropping.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.