How can I drop variables with more than 20% missing values?

Hi, I currently have a dataset 'data' contains more than 100 variables. I am doing some data cleaning right now, and I would like to drop all variables with more than 20% missing values.

Right now I have the following code,

library(purrr)
data2 <- data[!map_lgl(data, (is.na(.)))]

But I am getting this error message: Error: Can't convert a logical vector to function
Call rlang::last_error() to see a backtrace

Is there a better/correct way to do this? Thanks!

1 Like

Welcome to the community!

Does this work for you?

is_column_with_at_least_eighty_percent_non_missing <- function(t)
{
  mean(x = is.na(x = t)) < 0.20
}

Filter(f = is_column_with_at_least_eighty_percent_non_missing,
       x = dataset)
1 Like

This works! I also solved by doing this:

data2 <- data[, which(colMeans(is.na(data)) > 0.5)]

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.