tidyr::expand(tidyr::nesting(...)) vs dplyr::distinct(...)

Hi all,

I would like to know if there is any fundamental difference (other than the row ordering) between using a combination of tidyr::expand() and tidyr::nesting() versus using dplyr::distinct(). The following example can be found in the help file for tidyr::expand():

fruits <- tibble(
  type   = c("apple", "orange", "apple", "orange", "orange", "orange"),
  year   = c(2010, 2010, 2012, 2010, 2010, 2012),
  size  =  factor(
    c("XS", "S",  "M", "S", "S", "M"),
    levels = c("XS", "S", "M", "L")
  ),
  weights = rnorm(6, as.numeric(size) + 2)
)

fruits %>% expand(nesting(type, size))

# A tibble: 4 x 2
  type   size 
  <chr>  <fct>
1 apple  XS   
2 apple  M    
3 orange S    
4 orange M 

fruits %>% distinct(type, size)

# A tibble: 4 x 2
  type   size 
  <chr>  <fct>
1 apple  XS   
2 orange S    
3 apple  M    
4 orange M   
1 Like

Hi @gueyenono,

A slight modification to your example might make things clearer.

library(tidyverse)

fruits <- tibble(
  type   = c("apple", "orange", "apple", "orange", "orange", "orange"),
  year   = c(2010, 2010, 2012, 2010, 2010, 2012),
  size  =  factor(
    c("XS", "S",  "M", "S", "S", "M"),
    levels = c("XS", "S", "M", "L")
  ),
  weights = rnorm(6, as.numeric(size) + 2)
)

Nesting(type size), by itself essentially returns

fruits %>% distinct(type, size)
#> # A tibble: 4 x 2
#>   type   size 
#>   <chr>  <fct>
#> 1 apple  XS   
#> 2 orange S    
#> 3 apple  M    
#> 4 orange M

When combined with tidy::expand, it will return each combination of what's in nesting with an additional variable outside of the nesting function, in this case year.

fruits %>% expand(nesting(type, size), year)
#> # A tibble: 8 x 3
#>   type   size   year
#>   <chr>  <fct> <dbl>
#> 1 apple  XS     2010
#> 2 apple  XS     2012
#> 3 apple  M      2010
#> 4 apple  M      2012
#> 5 orange S      2010
#> 6 orange S      2012
#> 7 orange M      2010
#> 8 orange M      2012

tidyr::distinct will just return the unique combinations

fruits %>% distinct(type, size, year)
#> # A tibble: 4 x 3
#>   type    year size 
#>   <chr>  <dbl> <fct>
#> 1 apple   2010 XS   
#> 2 orange  2010 S    
#> 3 apple   2012 M    
#> 4 orange  2012 M

HTH

Created on 2021-03-23 by the reprex package (v1.0.0)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.