I want to drop the columns of a data frame which are completely filled with
NA, and keep all the others (including those which have a few
NA, but are not all
NA). My current solution:
library(purrr) big_data <- replicate(10, data.frame(rep(NA, 1e6), sample(c(1:8, NA), 1e6, T), sample(250, 1e6, T)), simplify = FALSE) bd <- do.call(data.frame, big_data) names(bd) <- paste0('X', seq_len(30)) rm(big_data) # current solution index <- map_lgl(bd, ~ all(is.na(.))) bd_sans_NA_cols <- bd[, index]
It works and it's blazing fast. Is this the way you would do it, or is there a more
tidyverse-y way? Note that the solution must be fast because my real use case (which I can't share for IP reasons) is a data frame over 10 Gb big (in memory: weirdly, on disk it's about half as big).