Counting the number of values not being NA among several columns (Also issue with grouped data frames)

Hello guys!

I need to know how to count simply the number of values among several columns row-wise. It's not about counting certain strings or numbers, it's just about the number of values not being NA in specific columns.

I dont know if this would be the right way to get the desired outcome. If yes, what would I have to insert into the mutate() brackets?

df <- df %>% 
rowwise() %>% 
mutate(abc01 = sum())

This is the kind of result I need:
06

You can try something like this to count missing and then subtract or write it slightly differently for number of values per row

df %>%
  rowwise %>%
  summarise(NA_per_row = sum(is.na(.)))

Unfortunately for some reason it doesn't work out properly. When I assign this operation to a dataframe this is the result:
fg

I have already tried this one which works out in a simple toy data frame:

df <- mutate(df, NA_per_row=rowSums(is.na(df)))

In the data frame I want to use this on for some reason an error appears even though the code is the same:

df    <- mutate(df,    NA_per_row=rowSums(is.na(df)))
ABC01 <- mutate(ABC01, NA_per_row=rowSums(is.na(ABC01)))

"> ABC01 <- mutate(ABC01, NA_per_row=rowSums(is.na(ABC01)))
Fehler: Column NA_per_row must be length 1 (the group size), not 309"

I really don't know what's the difference the two data frames to this error to appear in the first place.

probably you have groups() set on the dataframe. try ungroup() on it first before mutating it

The error remains. :frowning:

> ungroup(ABC01)
> ABC01 <- mutate(ABC01, NA_per_row=rowSums(is.na(ABC01)))
Fehler: Column `NA_per_row` must be length 1 (the group size), not 309

This dataframe is the result of a select() out of a data frame used a group_by() operation on. Does this make difference in comparion to groups() ?

Row-wise operations are alway a little tricky in R with atoms being vectors.

How about this combined purrr trick?

library(tidyverse)

df <- tribble(~n, ~s, ~b, 1, 2, NA, NA, 4, NA, NA, 8, 7, 9, NA, 11, NA, NA, NA) 
df %>% 
    mutate(valid_in_row = map(., ~(!is.na(.x))) %>% pmap_int(sum))
# A tibble: 5 x 4
      n     s     b valid_in_row
  <dbl> <dbl> <dbl>        <int>
1     1     2    NA            2
2    NA     4    NA            1
3    NA     8     7            2
4     9    NA    11            2
5    NA    NA    NA            0

The intermediate step is:

> map(df, ~(!is.na(.x)))
$n
[1]  TRUE FALSE FALSE  TRUE FALSE

$s
[1]  TRUE  TRUE  TRUE FALSE FALSE

$b
[1] FALSE FALSE  TRUE  TRUE FALSE
2 Likes

The issue with the grouped data error remains even with this operation which works just fine with the toy data frame. :frowning:

But this seems to be another question not belonging into this thread, I suppose?

Your post thus is nonetheless the solution for the question I asked above. :slight_smile: :+1:

As is typical for R and the tidyverse, calling ungroup on a dataframe but not assigning the result to anywhere, means the result is ephemeral, ABC01 would remain grouped.
Best practice for assignment is to use <-

1 Like

Dear nirgrahamuk,

you just solved my grouped data problem. I am thankful to you just as I am to smichal. Unfortunately I can't choose two solutions here - I really would love to. :disappointed:

I have also changed the title of the thread because with the grouped data issue another question has come in at the midway point.

1 Like

This is a simple one-liner in base R for the OQ:

df$SUM <- apply(df, 1, function(rr) sum(is.na(rr)))

and it works for data.frames with different data types as well as matrices.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.