Mutate dummy variable based on other variables

id age city cancer tb malaria
1 14 A Yes No No
2 15 B No No Yes
3 16 C No No No

In the example above, cancer, tb, and malaria are diseases. How could I mutate a variable disease such that it takes 1 when an observation has at least one disease otherwise 0. In my real data set, there are many more diseases. Instead of using cancer == "Yes"| tb == "Yes"| malaria == "Yes", is there a better alternative?

library(dplyr)
df <- tibble(
  id = c(1:3),
  age = c(14, 15, 16),
  city = c("A", "B", "C"),
  cancer = c("Yes", "No", "No"),
  tb = c("No", "No", "No"),
  malaria = c("No", "Yes", "No"),
)

You can use this trick

library(dplyr)
library(tidyr)
library(stringr)

df <- tibble(
    id = c(1:3),
    age = c(14, 15, 16),
    city = c("A", "B", "C"),
    cancer = c("Yes", "No", "No"),
    tb = c("No", "No", "No"),
    malaria = c("No", "Yes", "No"),
)

df %>% 
    unite(disease, cancer:malaria, remove = FALSE) %>% 
    mutate(disease = as.numeric(str_detect(disease, "Yes")))
#> # A tibble: 3 x 7
#>      id   age city  disease cancer tb    malaria
#>   <int> <dbl> <chr>   <dbl> <chr>  <chr> <chr>  
#> 1     1    14 A           1 Yes    No    No     
#> 2     2    15 B           1 No     No    Yes    
#> 3     3    16 C           0 No     No    No

Created on 2021-01-31 by the reprex package (v1.0.0)

1 Like

Thanks a lot. It is indeed a nice trick!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.