Counting the number of values not being NA among several columns (Also issue with grouped data frames)

anyway01 · September 16, 2020, 2:46pm

Hello guys!

I need to know how to count simply the number of values among several columns row-wise. It's not about counting certain strings or numbers, it's just about the number of values not being NA in specific columns.

I dont know if this would be the right way to get the desired outcome. If yes, what would I have to insert into the mutate() brackets?

df <- df %>% 
rowwise() %>% 
mutate(abc01 = sum())

This is the kind of result I need:

GreyMerchant · September 16, 2020, 2:54pm

You can try something like this to count missing and then subtract or write it slightly differently for number of values per row

df %>%
  rowwise %>%
  summarise(NA_per_row = sum(is.na(.)))

anyway01 · September 16, 2020, 3:18pm

Unfortunately for some reason it doesn't work out properly. When I assign this operation to a dataframe this is the result:

I have already tried this one which works out in a simple toy data frame:

df <- mutate(df, NA_per_row=rowSums(is.na(df)))

In the data frame I want to use this on for some reason an error appears even though the code is the same:

df    <- mutate(df,    NA_per_row=rowSums(is.na(df)))
ABC01 <- mutate(ABC01, NA_per_row=rowSums(is.na(ABC01)))

"> ABC01 <- mutate(ABC01, NA_per_row=rowSums(is.na(ABC01)))
Fehler: Column NA_per_row must be length 1 (the group size), not 309"

I really don't know what's the difference the two data frames to this error to appear in the first place.

nirgrahamuk · September 16, 2020, 3:24pm

probably you have groups() set on the dataframe. try ungroup() on it first before mutating it

anyway01 · September 16, 2020, 3:29pm

The error remains.

> ungroup(ABC01)
> ABC01 <- mutate(ABC01, NA_per_row=rowSums(is.na(ABC01)))
Fehler: Column `NA_per_row` must be length 1 (the group size), not 309

This dataframe is the result of a select() out of a data frame used a group_by() operation on. Does this make difference in comparion to groups() ?

smichal · September 16, 2020, 3:29pm

Row-wise operations are alway a little tricky in R with atoms being vectors.

How about this combined purrr trick?

library(tidyverse)

df <- tribble(~n, ~s, ~b, 1, 2, NA, NA, 4, NA, NA, 8, 7, 9, NA, 11, NA, NA, NA) 
df %>% 
    mutate(valid_in_row = map(., ~(!is.na(.x))) %>% pmap_int(sum))

# A tibble: 5 x 4
      n     s     b valid_in_row
  <dbl> <dbl> <dbl>        <int>
1     1     2    NA            2
2    NA     4    NA            1
3    NA     8     7            2
4     9    NA    11            2
5    NA    NA    NA            0

The intermediate step is:

> map(df, ~(!is.na(.x)))
$n
[1]  TRUE FALSE FALSE  TRUE FALSE

$s
[1]  TRUE  TRUE  TRUE FALSE FALSE

$b
[1] FALSE FALSE  TRUE  TRUE FALSE

anyway01 · September 16, 2020, 3:46pm

The issue with the grouped data error remains even with this operation which works just fine with the toy data frame.

But this seems to be another question not belonging into this thread, I suppose?

Your post thus is nonetheless the solution for the question I asked above.

nirgrahamuk · September 16, 2020, 4:32pm

As is typical for R and the tidyverse, calling ungroup on a dataframe but not assigning the result to anywhere, means the result is ephemeral, ABC01 would remain grouped.
Best practice for assignment is to use <-

anyway01 · September 16, 2020, 4:50pm

Dear nirgrahamuk,

you just solved my grouped data problem. I am thankful to you just as I am to smichal. Unfortunately I can't choose two solutions here - I really would love to.

I have also changed the title of the thread because with the grouped data issue another question has come in at the midway point.

Scoco · September 23, 2020, 7:46am

This is a simple one-liner in base R for the OQ:

df$SUM <- apply(df, 1, function(rr) sum(is.na(rr)))

and it works for data.frames with different data types as well as matrices.

system · September 30, 2020, 7:46am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.