I'm not sure why it doesn't work for you for the character columns. I'd do the same thing, using sum(is.na())
.
To showcase this, lets' recreate the sql_db
table. This time I'm also loading the purrr
package, because I'm in the habit of using the map
family of functions.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
library(purrr)
sql_db <- tribble(~age, ~sex, ~credit_turnover, ~acct_bal,
23, "f", NaN, 100,
21, "m", 28 , 23,
NaN, "f", 64, 46,
28, "f", NaN, 87,
41, "m", 35 , NaN)
So here we use a map
type function for summing the 1
s / TRUE
s together for each column in the dataframe. Specifically we're using the map_dbl
function because we want the function to return a double / numeric vector.
sql_db %>%
map_dbl(~sum(is.na(.)))
#> age sex credit_turnover acct_bal
#> 1 0 2 1
If we were to add an NA to first two values in the the sex column. You can see that this also works, even though sex
is a character variable.
sql_db$sex[1:2] <- NA
sql_db %>%
map_dbl(~sum(is.na(.)))
#> age sex credit_turnover acct_bal
#> 1 2 2 1
What I'm guessing could be the problem is that perhaps the "NA"s in your character variables aren't actually NA
values, but rather character strings containing the characters "NA".
sql_db$sex[1:2] <- "NA"
sql_db
#> # A tibble: 5 x 4
#> age sex credit_turnover acct_bal
#> <dbl> <chr> <dbl> <dbl>
#> 1 23 NA NaN 100
#> 2 21 NA 28 23
#> 3 NaN f 64 46
#> 4 28 f NaN 87
#> 5 41 m 35 NaN
sql_db %>%
map_dbl(~sum(is.na(.)))
#> age sex credit_turnover acct_bal
#> 1 0 2 1
If this is the case then the is.na()
function won't count them, as they're not actual NA
s as far as R can tell. So we need to tell R that strings consisting of the characters "NA" should be NA
.
So here we use a mutate_if
function to say that if a column is a character column we will change that character column in a way that makes values NA
if they simply consist of the characters "NA".
sql_db$sex[1:2] <- "NA" # Again, this isn't an actual NA
sql_db %>%
mutate_if(is.character,
~na_if(., "NA")) %>% # Here we make them actual NA values.
map_dbl(~sum(is.na(.)))
#> age sex credit_turnover acct_bal
#> 1 2 2 1
Here's hoping that this actually helps you. Best of luck and let me know if this is not what's causing the problem.