Average across columns with missing values

Hello! I am new to R and am self-learning this program for an internship. I am attempting to average across four different readings of systolic blood pressure. There are missing values in the data set and I am asked to take the average of three valid readings (a fourth BP reading was only taken if one of the first three was invalid).

Logic goes something like this...
If NAs=1 take the the average of the three other readings.
if NAs>1 code as NA

A code I found online using the purrr and dplyr packages is as follows

New_Systolic %>%
mutate(

Systolic_Avg = pmap_dbl(list(systolic1, systolic2, systolic3, systolic4), function(...){
  
  row_values <- unlist(list(...))
  
  number_of_NAs <- sum(is.na(row_values))
  
  map_dbl(number_of_NAs, ~ case_when(
    .x == 1 ~ mean(row_values, na.rm = TRUE),
    TRUE ~ NA_real_
  ))
}))

However, there was no error message but the new variable (Systolic_Avg) was not created.

Thank you so much!!

xbechtel

I don't follow your logic.
If there are 3 intiial readings, any of which may be NA, and there is a 4th reading only when at least one of the initial 3 are NA, then what is the problem with just taking the average for all 4 readings always and ignoring NA's , based on the setup describe, the average will contain at most 3 contributing inputs ?

Hello!

I should have been more clear. There are also more than 1 NA in the four readings. For example, there are participants who only have one or two valid readings and I would like to code then as NA for the new variable as well. I only want to keep those who have three valid readings for the new variable. I do not want to delete any of the participants yet because there are still some weights that need to be applied first.

Thanks!
xbechtel

library(tidyverse)
set.seed(42)
(fakedata0 <- tibble(
  v1 = sample(c(NA_real_, 1:5), size = 10, replace = TRUE),
  v2 = sample(c(NA_real_, 1:5), size = 10, replace = TRUE),
  v3 = sample(c(NA_real_, 1:5), size = 10, replace = TRUE),
  v4 = sample(c(1:5), size = 10, replace = TRUE)
))

(fakedata <- mutate(fakedata0, v4=
                      ifelse(is.na(rowMeans(cbind(v1, v2, v3))) ,
                              v4, 
                              NA_real_)))

# the calculation
(result <- fakedata %>%
  mutate(average_of_3_or_NA = ifelse(rowSums(is.na(cbind(v1, v2, v3, v4))) > 1, 
                                     NA,
                                    rowMeans(cbind(v1, v2, v3, v4),na.rm = TRUE  ))
))

# A tibble: 10 x 5
      v1    v2    v3    v4 average_of_3_or_NA
   <dbl> <dbl> <dbl> <dbl>              <dbl>
 1    NA    NA     3     2              NA   
 2     4     4     4    NA               4   
 3    NA     5     4     2               3.67
 4    NA     3     4     4               3.67
 5     1     1     3    NA               1.67
 6     3     1     1    NA               1.67
 7     1     2     3    NA               2   
 8     1    NA     2     4               2.33
 9    NA    NA     1     5              NA   
10     3     2    NA     4               3
1 Like

Thank you so much!

xbechtel

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.