Tidy way to split a column

I have a data frame like so:

pathway condition enriched genes
p1 x yes foo,bar,baz

I want to massage it into something like this:

pathway condition enriched genes
p1 x yes foo
p1 x yes bar
p1 x yes baz

I was looking at separate function but it's not quite what I want. Do I need to split with stringr first? I can't quite connect the dots so I would appreciate any pointers in the right direction.

look up tidyr::separate_rows
if the number of genes are not fixed or equal you might have to resort to stringr:str_split
I would start with stringr::str_split so for instance stringr::str_split("foo,bar,baz",pattern = ",", simplify = T) will give you

[,1]  [,2]  [,3] 

[1,] "foo" "bar" "baz"

Now the rest you can figure out, I take it.

3 Likes

That's interesting, @infominer! Does separate_rows have a similar effect to separate %>% gather?

Just to add to the previous comments, here's a quick reprex which might help. I added an extra row to the example data for demo purposes

library(tidyverse)
df1 <- tibble::tribble(
  ~pathway, ~condition, ~enriched,        ~genes,
      "p1",        "x",     "yes", "foo,bar,baz",
      "p2",        "y",     "yes", "zoo,zar"
)
df1 %>% separate_rows(genes)
#> # A tibble: 5 x 4
#>   pathway condition enriched genes
#>   <chr>   <chr>     <chr>    <chr>
#> 1 p1      x         yes      foo  
#> 2 p1      x         yes      bar  
#> 3 p1      x         yes      baz  
#> 4 p2      y         yes      zoo  
#> 5 p2      y         yes      zar

Created on 2018-06-21 by the reprex package (v0.2.0).

5 Likes

Thanks everyone, the separate_rows function did the trick.