I don't understand the behavior of map coupled with all(is.na)

purrr

#1

I want to learn about the map family of function, so I’ve started using for some simple coding exercises, in order to learn how it works. Now, according to the help,

the map functions transform their input by applying a function to each element and returning a vector the same length as the input.

I interpret this sentence in the following way: if I have a list (and a data frame is basically a list of column vectors), map applies the function to each element of the list, i.e., to each column. In other words, I interpret map as equivalent to apply(data, 2, function) (is this correct?). To test my understanding, I wrote some code to drop the columns of a large dataframe which are completely filled with NA. In order to do that, I first of all try to identify the columns which are all NA, by applying all(is.na) to all columns of the dataframe bd with map_lgl or apply. It does’t work:

library(purrr)

# create a large dataframe
big_data <- replicate(10, data.frame(rep(NA, 1e6), sample(c(1:8, NA), 1e6, T), 
                                     sample(250, 1e6, T)), simplify = FALSE)
bd <- do.call(data.frame, big_data)
names(bd) <- paste0('X', seq_len(30))
rm(big_data)

# doesn't work
map_lgl(bd, all(is.na))
#> Warning in all(is.na): coercing argument of type 'builtin' to logical
#> Error in as_mapper(.f, ...): cannot coerce type 'builtin' to vector of type 'logical'

# doesn't work
apply(bd, 2, all(is.na))
#> Warning in all(is.na): coercing argument of type 'builtin' to logical
#> Error in match.fun(FUN): cannot coerce type 'builtin' to vector of type 'logical'

# works! Why?!
apply(bd, 2, sum)
#>        X1        X2        X3        X4        X5        X6        X7 
#>        NA        NA 125526934        NA        NA 125500043        NA 
#>        X8        X9       X10       X11       X12       X13       X14 
#>        NA 125407221        NA        NA 125497541        NA        NA 
#>       X15       X16       X17       X18       X19       X20       X21 
#> 125568472        NA        NA 125594203        NA        NA 125453994 
#>       X22       X23       X24       X25       X26       X27       X28 
#>        NA        NA 125542644        NA        NA 125673092        NA 
#>       X29       X30 
#>        NA 125462156

However, using map_dbl or apply to apply the sum function to all columns of bd (in other words, to reproduce colSums()) works. It looks like I’m misunderstanding something about the way all(is.na) works. Can you help me?


#2

I’m not 100% sure I understand the question, but builtin is one of the base typeof objects.. I believe you’re calling the base function is.na(x), but you haven’t given it any arguments. So ?all is not being passed a logical vector.


#3

map_lgl is interpreting all(is.na()) as a function name. See examples below

suppressPackageStartupMessages(library(tidyverse))
big_data <- replicate(10, data.frame(rep(NA, 1e6), sample(c(1:8, NA), 1e6, T), 
                                                                         sample(250, 1e6, T)), simplify = FALSE)
bd <- do.call(data.frame, big_data)
names(bd) <- paste0('X', seq_len(30))
rm(big_data)

# this 
# map_lgl(bd, all(is.na))
# is, effect,doing this with each column
all(is.na)(bd[1])
#> Warning in all(is.na): coercing argument of type 'builtin' to logical
#> Error in eval(expr, envir, enclos): cannot coerce type 'builtin' to vector of type 'logical'


# better to use formula form
map_lgl(bd, ~ all(is.na(.)))
#>    X1    X2    X3    X4    X5    X6    X7    X8    X9   X10   X11   X12 
#>  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE 
#>   X13   X14   X15   X16   X17   X18   X19   X20   X21   X22   X23   X24 
#>  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE 
#>   X25   X26   X27   X28   X29   X30 
#>  TRUE FALSE FALSE  TRUE FALSE FALSE

# or function form
map_lgl(bd, function(c) { all(is.na(c)) })
#>    X1    X2    X3    X4    X5    X6    X7    X8    X9   X10   X11   X12 
#>  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE 
#>   X13   X14   X15   X16   X17   X18   X19   X20   X21   X22   X23   X24 
#>  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE 
#>   X25   X26   X27   X28   X29   X30 
#>  TRUE FALSE FALSE  TRUE FALSE FALSE

Created on 2018-03-02 by the reprex package (v0.2.0).


#4

hey @danr thanks a bunch! I understand my error now.