Hi,
I know that sum(NA, na.rm = TRUE)
results in 0. However, I wouldn't necessarily want this 0 to appear in a summarise() output. Please see the minimal example below.
tbl <- tibble(id = c(1,2), values = c(NA, 5))
tbl %>% group_by(id) %>% summarise(tot = sum(values, na.rm = TRUE))
#> # A tibble: 2 × 2
#> id tot
#> <dbl> <dbl>
#> 1 1 0
#> 2 2 5
Subsequent calculations using the tot
column would be wrong (e.g., mean of tot
). I know this is a symptom of the sum()
function, but was wondering if there is a "safer" way of doing such summarize? I suppose one approach is an initial test, e.g.,
tbl %>% group_by(id) %>% summarise(tot = if_else(all(is.na(values)), NA, sum(values, na.rm = TRUE)))
Are there better ways to avoid this potential pitfall? Thank you in advance for any feedback and guidance.