Using conditional na.rm with dplyr::summarise_all

Hi,

When using the sum function (and probably other similar functions), the na.rm argument has a weird characteristic that if all the observations are NAs it will return 0. For example:

> x <- c(NA, NA, NA)
> sum(x, na.rm = TRUE)
[1] 0

As I am interested that the function will return NA if and only if all the observations are NAs by using the following condition:

> sum(x, na.rm = any(!is.na(x)))
[1] NA

Any suggestion on how to apply this condition when using dplyr::summarise_all? For example, here is the function I am using:

 df1 <- df %>% dplyr::select(-TIMESTAMP) %>%
    dplyr::group_by(date, hour) %>%
      dplyr::summarise_all(list(base::sum), na.rm = ???)

Thanks,
Rami

Hi @RamiKrispin. You may try to pass your own function to the summarise_all as following.

df1 <- df %>% dplyr::select(-TIMESTAMP) %>%
    dplyr::group_by(date, hour) %>%
      dplyr::summarise_all(~{sum(.x, na.rm = any(!is.na(.x)))})
3 Likes

Yes, that works. Thanks!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.