group_by and separate into rows

ldecicco · May 5, 2020, 2:33pm

Here's what I have as some raw data:

df <- tibble(x = c("a","b","c"),
             id = c("x|y", "x", "x|y|z"),
             name = c("i|j","i",'i|j|k'))

Here's what I want;

df_want <- tibble(x = c("a","a","b","c","c","c"),
                  id = c("x", "y", "x", "x", "y", "z"),
                  name = c("i","j", "i", "i", "j", "k"))

I've done it using a bunch of pivots...but I'd be surprised if this is the best way:

not_all_na <- function(x) {!all(is.na(x))}

df %>% 
  separate(id, into = paste0("ID_",letters),
           sep = "\\|") %>% 
  separate(name, into = paste0("Name_",letters),
           sep = "\\|") %>% 
  select_if(not_all_na) %>% 
  pivot_longer(cols = -x, 
               names_to = c("element", ".value"),
               names_pattern = "(.+)_(.+)") %>% 
  pivot_longer(cols = c(-x, -element), 
               names_to = "letter",
               values_to = "value") %>% 
  filter(!is.na(value)) %>% 
  pivot_wider(id_cols = c(x, letter),
              names_from = element,
              values_from = value) %>% 
  select(-letter)

I was trying to figure out if there's a way to use group_by and have the summarize output be a bind_row of the separated characters, but I haven't gotten that to work yet. I'm not fluent in purrr...so I bet that is the key. Anyway, if anyone has any suggestions, I'd love to simplify this task.

siddharthprabhu · May 5, 2020, 2:41pm

You're looking for separate_rows() from tidyr.

library(dplyr, warn.conflicts = FALSE)
library(tidyr)

df <- tibble(x = c("a", "b", "c"),
             id = c("x|y", "x", "x|y|z"),
             name = c("i|j", "i", "i|j|k"))

df %>% separate_rows(id, name)
#> # A tibble: 6 x 3
#>   x     id    name 
#>   <chr> <chr> <chr>
#> 1 a     x     i    
#> 2 a     y     j    
#> 3 b     x     i    
#> 4 c     x     i    
#> 5 c     y     j    
#> 6 c     z     k

^{Created on 2020-05-05 by the reprex package (v0.3.0)}

ldecicco · May 5, 2020, 3:36pm

HA! Wow, awesome....wonder what other functions I've just totally missed?

siddharthprabhu · May 5, 2020, 6:08pm

I'd recommend these two lectures. Taught me a couple of interesting tricks.

ldecicco · May 6, 2020, 12:29pm

Ha, the funny thing is I just found a script of mine where I used separate_rows()...so really the question is how to remember it all?

system · May 13, 2020, 12:29pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.