Condensing five binary variables into one categorical variable

Hi All,

I have a large dataset that outlines 5 different medications given to a patient at a single time point, each medication has it's own column and is binary with 0 indicating no medication and 1 indication they had the medication, there are a small number of patients who will have had more than 1 medication. I want to condense these into a single column that outlines which medication the patient got or if they got more than one medication then code this as "multiple". Below is a reprex of a simplified version of what I am trying to do, the column "induction agent" is what I am trying to get. The code below works partially but doesn't allow for creation of a "multiple medications" category and gave me incorrect quantities when I summed the total values in the larger datasheet.

Thanks for your help

id<- c(1,2,3)
cd_25 <- c(0,0,1)
ivig <- c(1,0,0)
b_cell <- c(0,1,0)


df <- data.frame(id, cd_25, ivig, b_cell)

df$induction_agent <- names(df[2:4])[max.col(df[2:4])]

print (df)
#>   id cd_25 ivig b_cell induction_agent
#> 1  1     0    1      0            ivig
#> 2  2     0    0      1          b_cell
#> 3  3     1    0      0           cd_25

How about this?

library(dplyr)
library(tidyr)

sample_df <- data.frame(
          id = c(1, 2, 3),
       cd_25 = c(0, 0, 1),
        ivig = c(1, 0, 0),
      b_cell = c(1, 1, 0)
)

sample_df %>% 
    gather(-id, key = induction_agent, value = presence) %>% 
    filter(presence == 1) %>% 
    group_by(id) %>% 
    summarise(induction_agent = paste(induction_agent, collapse = ", ")) %>% 
    right_join(sample_df)
#> Joining, by = "id"
#> # A tibble: 3 x 5
#>      id induction_agent cd_25  ivig b_cell
#>   <dbl> <chr>           <dbl> <dbl>  <dbl>
#> 1     1 ivig, b_cell        0     1      1
#> 2     2 b_cell              0     0      1
#> 3     3 cd_25               1     0      0

# If you prefer the "multiple" recode
sample_df %>% 
    gather(-id, key = induction_agent, value = presence) %>% 
    filter(presence == 1) %>%
    add_count(id) %>% 
    mutate(induction_agent = if_else(n == 2, "multiple", induction_agent)) %>% 
    distinct(id, induction_agent) %>% 
    right_join(sample_df)
#> Joining, by = "id"
#>   id induction_agent cd_25 ivig b_cell
#> 1  3           cd_25     1    0      0
#> 2  1        multiple     0    1      1
#> 3  2          b_cell     0    0      1

Created on 2021-02-21 by the reprex package (v1.0.0)

1 Like

Amazing stuff, that works perfectly thanks you very much! Option 2 that creates the "multiple" character is exactly what I wanted, option 1 works as well and may actually be more useful if people want more info on what those multiple medications actually are.

Again, many thanks!

Hey just wondering would anyone have any idea why I'm getting the following error - the code above worked on the sample dataframe but when I applied to my actual data I got

"attributes are not identical across measure variables;
they will be droppedJoining, by = c("id", "induction_agent")"

And then the observations of the new column were filled with the headings of other random variables within the dataframe, strange!

Thanks!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.