Creating dataframe of permutations

torvaney · April 3, 2018, 9:26am

I want to create a dataframe containing the possible permutations of a set of items, where each item can either be included or not.

I was thinking that the finished dataframe ought to look something like:

| group | item | included |
|-------|------|----------|
| 1     | a    | T        |
| 1     | b    | F        |
| 1     | c    | F        |
| ...   | ...  | ...      |

Therefore we would have 2^n unique groups (including the one where all items are not included).

My initial thought was to use tidyr::crossing to create the different combinations; however, since it does not supply a group identifier on the resulting dataframe, I have been struggling.

suppressPackageStartupMessages(library(tidyverse))
                                                  
n_items <- 5                                      
items <- letters[1:n_items]                       
                                                  
n_included <- 1:n_items                           
generate_split <- function(x, len) {              
  tibble(split = x,                                 
         included = c(rep(TRUE, x), rep(FALSE, len - x)),  
         row = 1:len)                                      
}                                                 
                                                  
map(n_included, generate_split, len = n_items) %>%
  map(~ crossing(.x, item = items))                 
#> [[1]]
#> # A tibble: 25 x 4
#>    split included   row item 
#>    <int> <lgl>    <int> <chr>
#>  1     1 TRUE         1 a    
#>  2     1 TRUE         1 b    
#>  3     1 TRUE         1 c    
#>  4     1 TRUE         1 d    
#>  5     1 TRUE         1 e    
#>  6     1 FALSE        2 a    
#>  7     1 FALSE        2 b    
#>  8     1 FALSE        2 c    
#>  9     1 FALSE        2 d    
#> 10     1 FALSE        2 e    
#> # ... with 15 more rows
#> # ... (rest of the list items)

Any help in reaching the solution would be greatly appreciated.

torvaney · April 3, 2018, 10:18am

I appear to have come up with a solution that works okay:

suppressPackageStartupMessages(library(tidyverse))         
                                                           
n_items <- 5                                               
items <- letters[1:n_items]                                
                                                           
n_included <- 1:n_items                                    

# Find combinations of items, taken m at a time                                          
map(1:n_items, ~ combn(items, .x, simplify = FALSE)) %>%                      
  flatten() %>%                                              
  map(~ tibble(item = .x, included = T)) %>%                 
  map(~ full_join(.x, tibble(item = items), by = "item")) %>%
  map(~ replace_na(.x, list(included = FALSE))) %>%          
  imap_dfr(~ mutate(.x, group = .y))                         
#> # A tibble: 155 x 3
#>    item  included group
#>    <chr> <lgl>    <int>
#>  1 a     TRUE         1
#>  2 b     FALSE        1
#>  3 c     FALSE        1
#>  4 d     FALSE        1
#>  5 e     FALSE        1
#>  6 b     TRUE         2
#>  7 a     FALSE        2
#>  8 c     FALSE        2
#>  9 d     FALSE        2
#> 10 e     FALSE        2
#> # ... with 145 more rows