How to force `spread()` to return to previous wide dataframe?

So I have a tribble and I'd like to make a flag_var column if there is a cd_type in the row that starts with "231". I think there is a rowwise() approach to this problem but quite frankly don't want to deal with that. So I need to use gather() followed by a case_when() at this point I'll need to spread it, but as you can see it's not returning the same tibble as before. How can I force it to spread back to normal?

library(tidyverse)

dat <- tribble(~ID,   ~cd1,      ~cd2,      ~cd3,      ~cd4,
        1,    "231",    "219",    "1321",   "2314",
        2,    "222",    ""      , ""    ,   "",
        3,    "123",    "142",    ""    ,   "",
        4,    "145",    "521",    "2313",   "122")

dat %>% 
  gather("cd_type", "code", 2:5) %>% 
  mutate(flag_var = case_when(
    startsWith(code, "231") ~ 1,
    TRUE ~ 0
  )) %>% 
  arrange(ID, cd_type) %>%
  spread(cd_type, code) %>% 
  select(ID, cd1:cd4, flag_var)
#> # A tibble: 6 x 6
#>      ID cd1   cd2   cd3   cd4   flag_var
#>   <dbl> <chr> <chr> <chr> <chr>    <dbl>
#> 1     1 <NA>  219   1321  <NA>         0
#> 2     1 231   <NA>  <NA>  2314         1
#> 3     2 222   ""    ""    ""           0
#> 4     3 123   142   ""    ""           0
#> 5     4 145   521   <NA>  122          0
#> 6     4 <NA>  <NA>  2313  <NA>         1

Created on 2019-01-20 by the reprex package (v0.2.1)

As you can see, ID 1 has flag_var where it is 1 and where it is 0. So when you spread, you get 2 rows instead of one. So in this case you can use join instead:

library(tidyverse)

dat <- tribble(~ID,   ~cd1,      ~cd2,      ~cd3,      ~cd4,
               1,    "231",    "219",    "1321",   "2314",
               2,    "222",    ""      , ""    ,   "",
               3,    "123",    "142",    ""    ,   "",
               4,    "145",    "521",    "2313",   "122")

dat %>% 
  gather("cd_type", "code", 2:5) %>% 
  mutate(flag_var = case_when(
    startsWith(code, "231") ~ 1,
    TRUE ~ 0
  )) %>% 
  group_by(ID) %>%
  summarise(flag_var = sum(flag_var)) %>%
  left_join(dat, by = "ID")
#> # A tibble: 4 x 6
#>      ID flag_var cd1   cd2   cd3   cd4  
#>   <dbl>    <dbl> <chr> <chr> <chr> <chr>
#> 1     1        2 231   219   1321  2314 
#> 2     2        0 222   ""    ""    ""   
#> 3     3        0 123   142   ""    ""   
#> 4     4        1 145   521   2313  122

Created on 2019-01-20 by the reprex package (v0.2.1)
Of course, instead of sum you can use something else or convert it to TRUE/FALSE as needed.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.