Transform Single Column w/Comma Separated Values to Multi Columns 1 and 0

I am trying to prepare survey data for analysis. I have a question with "check all that apply" answers which are sitting in a single column in my data frame with the responses comma separated, ex. below:
|123|Yes|Happy, Sad|
|12346|Yes|Mad, Sad|

This is what I want it to look like:
|col1|col2|col3 = Happy|col4 = Sad|col5 = Mad|

I have found multiple postings and guidance on subsetting and separating the comma separated values but not on how to pivot the options to be the name of the columns and assign the 1 or 0 based on whether that selection was made by the respondent. Would love pointers. Thank you in advance!

FYI - I just received the instructions for reprex and will be using that in future posts to ensure I follow guidelines!

1 Like

This could work for you


dat <- tribble(~col1, ~col2 ,~col3,
        123, "Yes", "Happy, Sad",
        1234, "Yes", "Sad",
        12345, "Yes", NA,
        12346, "Yes", "Mad, Sad")

dat %>% 
  mutate(col3 = str_split(col3, ", ")) %>% 
  unnest() %>% 
  count(col1, col2, col3) %>% 
  spread(key = col3, value = n) %>% 
  replace(, 0)
#> # A tibble: 4 x 6
#>    col1 col2  Happy   Mad   Sad `<NA>`
#>   <dbl> <chr> <dbl> <dbl> <dbl>  <dbl>
#> 1   123 Yes       1     0     1      0
#> 2  1234 Yes       0     0     1      0
#> 3 12345 Yes       0     0     0      1
#> 4 12346 Yes       0     1     1      0

Created on 2019-01-16 by the reprex package (v0.2.1)


You're asking for what's commonly known as a model matrix or design matrix. Type ?model.matrix into your R command line to get some instructions about how to create them.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.