creating a new variable based on NA in others

brheik · August 18, 2020, 1:31pm

I have a set of 5 questions, that can be answered each with a score from 1-5. Some are not answered, resulting in NA. I want to create a new variabel, that is "1" when all questions are answered, i.e. no NA in any of the 5 questions, and "0" when one or more questions are not answered (=NA). Could anyone help?

AlexisW · August 18, 2020, 5:12pm

The *apply and map* functions will loop on the columns of a data.frame, making it quite easy.

ex_df <- tribble(~qu1,    ~qu2,   ~qu3,
                 4,        5,      1,
                 1,        4,      1,
                 1,        4,      2,
                 1,        2,      2,
                 NA,        3,      2,
                 2,        5,      1,
                 6 ,       5,      2,
                 2 ,       5,      3,
                 1 ,       4,      1,
                 NA ,       3,      1,
                 4 ,       2,      1,
                 2 ,       2,      2,
                 3 ,       4,      3,
                 3 ,       3   ,   3)

map_lgl(ex_df, ~any(is.na(.)))
#  qu1   qu2   qu3 
# TRUE FALSE FALSE

Or you can force the output to be integer with map_int().
Base R equivalent:

sapply(ex_df, function(x) 1L*any(is.na(x)))

Multiplying by 1L (the integer 1) converts to integer.

brheik · August 19, 2020, 6:57am

Thx! But this just gives a list, not a new variable in the data frame?

brheik · August 19, 2020, 7:43am

Tried like this:

nirgrahamuk · August 19, 2020, 8:53am


library(tidyverse)

(ex_df <- tribble(~qu1,    ~qu2,   ~qu3,
                  4,        5,      1,
                  1,        4,      1,
                  1,        4,      2,
                  1,        2,      2,
                  NA,        3,      2,
                  2,        5,      1,
                  6 ,       5,      2,
                  2 ,       5,      3,
                  1 ,       4,      1,
                  NA ,       3,      1,
                  4 ,       2,      1,
                  2 ,       2,      2,
                  3 ,       4,      3,
                  3 ,       3   ,   3))

( vars_to_consider <- c("qu1","qu2","qu3"))

(result_df <- rowwise(ex_df) %>% mutate(
  qsum = !is.na(sum(!!!syms(vars_to_consider), na.rm=FALSE))) %>% ungroup)
# # A tibble: 14 x 4
# qu1   qu2   qu3 qsum 
# <dbl> <dbl> <dbl> <lgl>
#  1     4     5     1 TRUE 
#  2     1     4     1 TRUE 
#  3     1     4     2 TRUE 
#  4     1     2     2 TRUE 
#  5    NA     3     2 FALSE
#  6     2     5     1 TRUE 
#  7     6     5     2 TRUE 
#  8     2     5     3 TRUE 
#  9     1     4     1 TRUE 
# 10    NA     3     1 FALSE
# 11     4     2     1 TRUE 
# 12     2     2     2 TRUE 
# 13     3     4     3 TRUE 
# 14     3     3     3 TRUE

system · September 9, 2020, 8:53am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.