Sometimes I want to view all rows in a data frame that will be dropped if I drop all rows that have a missing value for any variable. In this case, I'm specifically interested in how to do this with dplyr
1.0's across()
function used inside of the filter()
verb.
Here is an example data frame:
df <- tribble(
~id, ~x, ~y,
1, 1, 0,
2, 1, 1,
3, NA, 1,
4, 0, 0,
5, 1, NA
)
Code for keeping rows that DO NOT include any missing values is provided on the tidyverse website. Specifically, I can use:
df %>%
filter(
across(
.cols = everything(),
.fns = ~ !is.na(.x)
)
)
Which returns:
# A tibble: 3 x 3
id x y
<dbl> <dbl> <dbl>
1 1 1 0
2 2 1 1
3 4 0 0
However, I can't figure out how to return the opposite -- rows with a missing value in any variable. The result I'm looking for is:
# A tibble: 2 x 3
id x y
<dbl> <dbl> <dbl>
1 3 NA 1
2 5 1 NA
My first thought was just to remove the !
:
df %>%
filter(
across(
.cols = everything(),
.fns = ~ is.na(.x)
)
)
But, that returns zero rows.
Of course, I can get the answer I want with this code if I know all variables that have a missing value ahead of time:
df %>%
filter(is.na(x) | is.na(y))
But, I'm looking for a solution that doesn't require me to know which variables have a missing value ahead of time. Additionally, I'm aware of how to do this with the filter_all()
function:
df %>%
filter_all(any_vars(is.na(.)))
But, the filter_all()
function has been superseded by the use of across()
in an existing verb. See https://dplyr.tidyverse.org/articles/colwise.html
Other unsuccessful attempts I've made are:
df %>%
filter(
across(
.cols = everything(),
.fns = ~any_vars(is.na(.x))
)
)
df %>%
filter(
across(
.cols = everything(),
.fns = ~!!any_vars(is.na(.x))
)
)
df %>%
filter(
across(
.cols = everything(),
.fns = ~!!any_vars(is.na(.))
)
)
df %>%
filter(
across(
.cols = everything(),
.fns = ~any(is.na(.x))
)
)
df %>%
filter(
across(
.cols = everything(),
.fns = ~any(is.na(.))
)
)
This question is also posted on Stack Overflow.