How to filter by missing data

Hello all,
I want to sort through a data set, identify all people who are missing data in my Last_name variable, and create a new set of data called "noncompleters" that stores all of these people. I'm essentially looking for a command that is the inverse of na.omit or complete_cases, so that I can quickly pull all of these people out my my current data frame and re-assign them to a separate one. Any suggestions?

my code below has not been working, and stores "0 observations"; can't figure out what is wrong. I've tried this three different ways:

noncompleters=sonadata%>%filter(is.na(Last_name))
noncompleters=sonadata%>%filter(complete.cases(Last_name)==FALSE)
noncompleters=sonadata%>%filter(!complete.cases(Last_name))

Add a ! in front of is.na

if sonadata is a data.table then...
noncompleters <- sonadata[!is.na(Last_name), ]

This should work using tidyverse
noncompleters=sonadata%>%filter(!is.na(Last_name))

Welcome to RStudio Community!

Can you give us an example of your dataset sonadata? If you have NA values in Last_name, your first code attempt should return a new set of data containing only the rows with missing values for that variable. If that's not working and you know there are missing values in that variable then I'm guessing the missing values aren't being recognized as NA by R.

It will help to give us a reproducible example (aka reprex). There's some good information on how to get started with making a reproducible example here:

1 Like

Unfortunately I cannot give an example of the data, as this is from a study that is currently in the collection phase; I don't want to run into IRB issues by posting it or pieces of it.

The gist of it is that I have a text-entry box for participants to enter their name on the final page of my study in order to receive credit; thus, if that name box is blank, I know the person hasn't finished the survey. The goal with this code was to have it pull every participant with a missing last name out of the main data frame and put them in a separate data frame.

Thanks for the suggestion, though unfortunately this did not work; this retained all participants. Both data sets read "301 observations of 124 variables"

Sorry, I gave you the result for the opposite. The first example should work if missing is coded with NA

Aha, I think that might be the problem! Scrolling through the data in the tab view I now see that some of my data (the ones stored as vectors, I guess) have NA written in them. The name columns however, are just blank. How do I tell R to code the missing values in that column to NA, so that the code you gave works?

How did you read your data in? You may be able to address this at that point.

Since you can't post your actual data, the next best thing is to post a small dataset that shows the same problem.

For example, here is a very small dataset that has missing values in two variables. The third row is missing for the categorical "b" variable. However, by default the blank is left as a blank for categorical variables (unlike numeric variables).

dat = read.csv(text = "a, b, c
1, b, 1
, c, 2
2,, 3")

dat
#>    a  b c
#> 1  1  b 1
#> 2 NA  c 2
#> 3  2    3

When using functions from the read.table() family, na.strings can be useful for defining what should be interpreted as NA. In this example, blanks are NA so I could use na.strings = "".

     dat = read.csv(text = "a, b, c
1, b, 1
, c, 2
2,, 3", na.strings = "")
     
     dat
#>    a    b c
#> 1  1    b 1
#> 2 NA    c 2
#> 3  2 <NA> 3

If the blanks-as-NA can't be addressed when reading in the dataset, another option is to manually change blanks to NA in R. For example, dplyr::na_if() can be used to replace blanks with NA in a variable.

library(dplyr)

dat = read.csv(text = "a, b, c
1, b, 1
, c, 2
2,, 3")

dat
#>    a  b c
#> 1  1  b 1
#> 2 NA  c 2
#> 3  2    3

# No NA in b so returns no rows
filter(dat, is.na(b) )
#> [1] a b c
#> <0 rows> (or 0-length row.names)

# Manually replace blanks with NA
dat %>%
     mutate(b = na_if(b, "") ) %>%
     filter( is.na(b) )
#>   a    b c
#> 1 2 <NA> 3

Created on 2019-11-26 by the reprex package (v0.3.0)

3 Likes

Brilliant! na.strings worked!!! Thanks so much, you just solved a huge headache!!!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.