how to calculate number of missing values in R

So I am trying to get count of 'Na' in each column of data set 'sql_db', my idea was to ask R if they are Na values in 'sql_db' using is.na and then it returns true and false for each cell, and then converted trues to 0 and false to '1', so I sum for each column to get the total Na's. However this sum do not work for a character variable. How do I go about this? or are they better approach to fetch "Na's" for each column of sql_db?

sql_db <- tribble(~age, ~sex, ~credit_turnover, ~acct_bal,
23, "f", NaN, 100,
21, "m", 28 , 23,
NaN, "f", 64, 46,
28, "f", NaN, 87,
41, "m", 35 , NaN)

I don't think your example tibble covers this possibility

NA and NaN aren't the same thing. NA is a missing value while NaN is 'Not a Number' (usually the result of a computation involving division by zero).

You can compute is.na() on character vectors (as shown below). is.na() will also return TRUE for NaN values. If you want to exclusively count NaN values, you can use is.nan() instead.

library(tidyverse)

sql_db <- tribble(~age, ~sex, ~credit_turnover, ~acct_bal,
                  23, "f", NaN, 100,
                  21, "m", 28 , 23,
                  NaN, "f", 64, 46,
                  28, "f", NaN, 87,
                  41, "m", 35 , NaN)

summarise_all(sql_db, ~ sum(is.na(.x)))
#> # A tibble: 1 x 4
#>     age   sex credit_turnover acct_bal
#>   <int> <int>           <int>    <int>
#> 1     1     0               2        1

# Replacing first row values with NA.
sql_db[1, ] <- NA

summarise_all(sql_db, ~ sum(is.nan(.x)))
#> # A tibble: 1 x 4
#>     age   sex credit_turnover acct_bal
#>   <int> <int>           <int>    <int>
#> 1     1     0               1        1

Created on 2020-07-01 by the reprex package (v0.3.0)

1 Like

This worked perfectly. Thank you very much.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.