How to collapse/merge 3 string variables into one?

Hi all,
I have three independent variables I need to collapse into a single variable--but this needs to be done conditionally. Example tibble here:

ex_data=tibble::tribble(
      ~subject_id,  ~treat_1,        ~treat_1a,   ~treat_1b,
      "1001",     "Perphenazine",   "Quetiapine", NA,
      "1002",     "Quetiapine",     NA,         NA,
      "1004",     "Olanzapine",     NA,         NA,
      "1005",     "Quetiapine",     NA,         NA,
      "1006",     "Perphenazine",   NA,         "Quetiapine",
      "1007",     "Olanzapine",     NA,        NA,
      "1008",     "Risperidone",    NA,        "Quetiapine",
      "1009",     "Risperidone",    NA,        NA,
      "1011",     "Quetiapine",     "Olanzapine",        NA,
      "1012",      "Quetiapine",     NA,        NA,
      "1013",      NA,              "Olanzapine",   NA)

The current trials have each person assigned at least one medication, with some people being re-assigned to a different medication at a later time, based on study design reasons. What I need to do is collapse these three columns into a new variable I want to call "final_medication" that has only the latest assigned medication. So for example, if treat_1 is not blank but treat_1a and treat_1b are NA, then final_medication should be the string from treat_1; but if treat_1a is not NA, than final_medication should equal the string in that column; etc.

Any ideas how to do this? I'm thinking it's probably a good spot to use dplyr::case_when(), but I'm not sure how to write the code

Like this?

library(dplyr)
ex_data=tibble::tribble(
  ~subject_id,  ~treat_1,        ~treat_1a,   ~treat_1b,
  "1001",     "Perphenazine",   "Quetiapine", NA,
  "1002",     "Quetiapine",     NA,         NA,
  "1004",     "Olanzapine",     NA,         NA,
  "1005",     "Quetiapine",     NA,         NA,
  "1006",     "Perphenazine",   NA,         "Quetiapine",
  "1007",     "Olanzapine",     NA,        NA,
  "1008",     "Risperidone",    NA,        "Quetiapine",
  "1009",     "Risperidone",    NA,        NA,
  "1011",     "Quetiapine",     "Olanzapine",        NA,
  "1012",      "Quetiapine",     NA,        NA,
  "1013",      NA,              "Olanzapine",   NA)
ex_data |> rowwise() |> 
  mutate(Final=coalesce(treat_1b,treat_1a,treat_1))
#> # A tibble: 11 x 5
#> # Rowwise: 
#>    subject_id treat_1      treat_1a   treat_1b   Final      
#>    <chr>      <chr>        <chr>      <chr>      <chr>      
#>  1 1001       Perphenazine Quetiapine <NA>       Quetiapine 
#>  2 1002       Quetiapine   <NA>       <NA>       Quetiapine 
#>  3 1004       Olanzapine   <NA>       <NA>       Olanzapine 
#>  4 1005       Quetiapine   <NA>       <NA>       Quetiapine 
#>  5 1006       Perphenazine <NA>       Quetiapine Quetiapine 
#>  6 1007       Olanzapine   <NA>       <NA>       Olanzapine 
#>  7 1008       Risperidone  <NA>       Quetiapine Quetiapine 
#>  8 1009       Risperidone  <NA>       <NA>       Risperidone
#>  9 1011       Quetiapine   Olanzapine <NA>       Olanzapine 
#> 10 1012       Quetiapine   <NA>       <NA>       Quetiapine 
#> 11 1013       <NA>         Olanzapine <NA>       Olanzapine

Created on 2022-02-18 by the reprex package (v2.0.1)

2 Likes

Pretty much! That example works just like I want it to! Only problem is I'm having trouble getting it to work with my real data set. I think it's because the data were read in from SPSS and there's something off with how they are being stored...pic for reference below.

It looks like there are quotation marks and spaces in the string names...is there any way, perhaps with {stringr}, to remove all special characters from these columns?

image

It looks like you have fixed width data where you get 12 spaces where you would like to have NA. You can convert those to NA with code like the following.

library(dplyr)
ex_data <- ex_data |> mutate(across(where(is.character), ~na_if(., "            ")))

There are 12 spaces in the area between the double quotes in the na_if() function.

2 Likes

Brilliant, that worked! Thanks so much!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.