How to include NA in new dichotomous variable

Hello, I am very new to R and so I am still figuring out how to create new variables.

I am working on recoding answers to a survey question (shown below) to 1, 0, and NA to include missing values. I am trying to recode Yes to 1, No to 0, and "i don't know", "prefer not to answer", and "Skip" to NA. Below is the way the data table is formatted. However there are some ids that did not answer the survey so they already have NA for answer. I have been coding a binary variable for Yes and then another one for NA. However my new variable and dataset require to have 0,1, and NA as values for one variable. Any suggestions on how to code this?

Any help is much appreciated, thank you!


Survey answers:

  • Yes
  • No
  • I don't know
  • Prefer not to answer
  • Skip

data table:

id survey_question answer
01 Do you skate? No
02 Do you skate? I don't know
03 NA NA

Current code:
df[, new_variable:=ifelse( survey_question =='Do you skate?' & answer=='Yes', 1, 0)]
df[, NA_variable:=ifelse( survey_question =='Do you skate?' & answer=='I don't know' | answer=='Prefer not to answer' | answer=='Skip' , 1, 0)]

Hi, and welcome!

A reproducible example, called a reprex FAQ: What's a reproducible example (`reprex`) and how do I do one? is a great way to get more and better answers, generally. Usually the problem is communicating the data. Sometimes the data is too large, but just a sample that triggers the same issue is OK. Sometimes, it is confidential. That's a bit more tricky, but using a built-in dataset, such as mtcars and transforming it into a comparable structure is possible.

The following snippet is good to know


It will spit out an R object that can be cut and pasted as a code block and easily used.

Fortunately, the structure is very simple and isn't much trouble to reproduce.

structure(list(id = c("01", "02", "03", "04"), q = c("Do you skate?", 
"Do you skate?", "Do you skate?", "Do you skate?"), a = c("No", 
"I don't know", NA, "Yes")), class = c("spec_tbl_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -4L), spec = structure(list(
    cols = list(id = structure(list(), class = c("collector_character", 
    "collector")), q = structure(list(), class = c("collector_character", 
    "collector")), a = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))

The next step is to take the data frame dat and create a new column that encodes a into 1/0/NA. (Why dat? Because data is an R function.

For this we'll use the dplyr package, its mutate function and the base ifelse function).

dat <-  structure(list(id = c("01", "02", "03", "04"), q = c("Do you skate?", 
"Do you skate?", "Do you skate?", "Do you skate?"), a = c("No", 
"I don't know", NA, "Yes")), class = c("spec_tbl_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -4L), spec = structure(list(
    cols = list(id = structure(list(), class = c("collector_character", 
    "collector")), q = structure(list(), class = c("collector_character", 
    "collector")), a = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))
dat <- dat %>% mutate(coded = ifelse(a == "No", 0, NA)) %>% mutate(coded = ifelse(a == "Yes", 1, coded))
#> # A tibble: 4 x 4
#>   id    q             a            coded
#>   <chr> <chr>         <chr>        <dbl>
#> 1 01    Do you skate? No               0
#> 2 02    Do you skate? I don't know    NA
#> 3 03    Do you skate? <NA>            NA
#> 4 04    Do you skate? Yes              1

