Modern/updated dplyr way to remove columns with NA values?

I saw online with many similar guides as the above, but they use the deprecated functions such as select_if() or where().

What is the updated way to remove all columns with any NA values? I tried some with select(across()) or select(if_any()), but I think I'm missing the nuance.

df = data.frame(abc = c(1, 2, 3),
                def = c(4, 5, NA),
                ghi = c(NA, NA, NA))

# DOES NOT WORK -- tells me if_any needs to be in a dplyr verb...
df |>
  select(if_any(colSums(is.na(.) > 0)))

Not very modern, but less syntax to deal with

DF = data.frame(abc = c(1, 2, 3),
                def = c(4, 5, NA),
                ghi = c(NA, NA, NA))

na.omit(DF)
#> [1] abc def ghi
#> <0 rows> (or 0-length row.names)

(Each column contains at least one NA, so all are excluded.)

1 Like

Thanks, but that removes rows, not columns.

I also actually use the same method

dt <- function(x) { sum(!is.na(x)) > 0 }
data <- data %>% select_if(dt)
2 Likes

You're right. I fooled myself because the empty return

1 Like

select_if is deprecated. For example, it's not in tidytable.


df = data.frame(abc = c(1, 2, 3),
                def = c(4, 5, NA),
                ghi = c(NA, NA, NA))

df %>% select_if(~ !any(is.na(.)))
df %>% select(where(~ !any(is.na(.))))
3 Likes

I got lulled into complacency because it returned what I expected (which was wrong)

DF = data.frame(abc = c(1, 2, 3),
                def = c(4, 5, NA),
                ghi = c(NA, NA, NA))
DF[is.na(colMeans(DF))]
#>   def ghi
#> 1   4  NA
#> 2   5  NA
#> 3  NA  NA

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.