Logic behind dplyr's keeping/stripping of data frame classes

dplyr

#1

Is there a reason some dplyr functions strip classes from data frames and others preserve them? I could see stripping for functions that are more “create a new data frame” than “change an existing data frame”, but it’s tough to nail down exactly where that line is, and it really doesn’t seem to be between select and mutate.

d <- data.frame(x = 1:5) %>% structure(., class = c("my_df", class(.)))
select(d, x) %>% class()
#> [1] "my_df"      "data.frame"
mutate(d, x = x) %>% class()
#> [1] "data.frame"

#2

Same question for attributes. This seems crazy to me:

  • select: preserves class and attributes
  • mutate: preserves neither
  • slice: strips class, coerces to tbl_df, preserves attributes
d <- data.frame(x = 1:5) %>% structure(., class = c("my_df", class(.)), my_attr = "does it persist?")
select(d, x) %>% attributes()
#> $row.names
#> [1] 1 2 3 4 5
#> 
#> $class
#> [1] "my_df"      "data.frame"
#> 
#> $my_attr
#> [1] "does it persist?"
#> 
#> $names
#> [1] "x"
mutate(d, x = x) %>% attributes()
#> $class
#> [1] "data.frame"
#> 
#> $names
#> [1] "x"
#> 
#> $row.names
#> [1] 1 2 3 4 5
slice(d, 1:2) %>% attributes()
#> $row.names
#> [1] 1 2
#> 
#> $class
#> [1] "tbl_df"     "tbl"        "data.frame"
#> 
#> $my_attr
#> [1] "does it persist?"
#> 
#> $names
#> [1] "x"

#3

Re. attributes: this is something that’s known and being worked on right now. There are a few issues/threads in the dplyr github repo that you can peruse for more detail. And there’s ~related discussion in this thread here:

Re. classes, I don’t have a general answer, but the discussion between Hadley and Kiril re. S4 (linked to below) might be useful:


#4

Additional reference for those that want to formally extend tibbles. Especially useful will be sloop::reconstruct() for retaining custom attributes and classes.