 # how to calculate number of missing values in R

So I am trying to get count of 'Na' in each column of data set 'sql_db', my idea was to ask R if they are Na values in 'sql_db' using is.na and then it returns true and false for each cell, and then converted trues to 0 and false to '1', so I sum for each column to get the total Na's. However this sum do not work for a character variable. How do I go about this? or are they better approach to fetch "Na's" for each column of sql_db?

sql_db <- tribble(~age, ~sex, ~credit_turnover, ~acct_bal,
23, "f", NaN, 100,
21, "m", 28 , 23,
NaN, "f", 64, 46,
28, "f", NaN, 87,
41, "m", 35 , NaN)

I don't think your example tibble covers this possibility

`NA` and `NaN` aren't the same thing. `NA` is a missing value while `NaN` is 'Not a Number' (usually the result of a computation involving division by zero).

You can compute `is.na()` on character vectors (as shown below). `is.na()` will also return TRUE for `NaN` values. If you want to exclusively count `NaN` values, you can use `is.nan()` instead.

``````library(tidyverse)

sql_db <- tribble(~age, ~sex, ~credit_turnover, ~acct_bal,
23, "f", NaN, 100,
21, "m", 28 , 23,
NaN, "f", 64, 46,
28, "f", NaN, 87,
41, "m", 35 , NaN)

summarise_all(sql_db, ~ sum(is.na(.x)))
#> # A tibble: 1 x 4
#>     age   sex credit_turnover acct_bal
#>   <int> <int>           <int>    <int>
#> 1     1     0               2        1

# Replacing first row values with NA.
sql_db[1, ] <- NA

summarise_all(sql_db, ~ sum(is.nan(.x)))
#> # A tibble: 1 x 4
#>     age   sex credit_turnover acct_bal
#>   <int> <int>           <int>    <int>
#> 1     1     0               1        1
``````

Created on 2020-07-01 by the reprex package (v0.3.0)

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

This worked perfectly. Thank you very much.