sum rows with na

I have a data frame:

df <- data.frame(
stringsAsFactors = FALSE,
schid = c(3645L,3645L,3645L,3645L,
3645L,3645L,3645L,3645L,3645L,3645L,3645L,3645L),
monthsum = c(NA, NA, NA, NA, NA, NA, 7722, 10560, 9104, NA, NA, NA),
numcase = c(NA, NA, NA, NA, NA, NA, 27L, 33L, 32L, NA, NA, NA),
df = c(NA, NA, NA, NA, NA, NA, 26, 32, 31, NA, NA, NA),
ss = c(NA,NA,NA,NA,NA,NA,
320576.9216,391301.041952,232916.0944,NA,NA,NA),
dr2 = c(1, 1, 1, 1, NA, NA, NA, NA, NA, 1, 1, 1),
actual = c(NA, NA, NA, NA, NA, NA, 1, 1, 1, NA, NA, NA),
expect = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
smp = c(NA, NA, NA, 1, 1, 1, NA, NA, NA, NA, NA, NA)
)

sumdf <- df %>%
summarise(across(everything(), ~ sum(., is.na(.), 0)))

I want to total all variables, and the code works correctly if there is no null value like variable expect. But I want
to total all rows for all variables regardless of the null values, i.e. the variable dr2 should be 7, not NA. How to get that? Any suggestion ? Thanks in advance!

1 Like

Thanks for the nice example!

You could simply remove all the nas using na.rm = TRUE while calculating the sum:

sumdf <- df %>%
summarise(across(everything(), ~ sum(., na.rm = TRUE )))

1 Like

Thank you so much. Another question is the id variable - schid should be excluded from the summary and 3645 should be unchanged. How to do it? Thanks again!

Hi @tjcnnl1,
Try this:

sumdf <- df %>%
  group_by(schid) %>% 
  summarise(across(everything(), ~ sum(., na.rm=TRUE), .names="{.col}_sum"))
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.