Hi,
I want to drop the columns of a data frame which are completely filled with NA
, and keep all the others (including those which have a few NA
, but are not all NA
). My current solution:
library(purrr)
big_data <- replicate(10, data.frame(rep(NA, 1e6), sample(c(1:8, NA), 1e6, T),
sample(250, 1e6, T)), simplify = FALSE)
bd <- do.call(data.frame, big_data)
names(bd) <- paste0('X', seq_len(30))
rm(big_data)
# current solution
index <- map_lgl(bd, ~ all(is.na(.)))
bd_sans_NA_cols <- bd[, index]
It works and it's blazing fast. Is this the way you would do it, or is there a more tidyverse
-y way? Note that the solution must be fast because my real use case (which I can't share for IP reasons) is a data frame over 10 Gb big (in memory: weirdly, on disk it's about half as big).