Unexpected behavior use a Boolean operator with `where()`

Why can't I use a Boolean operator with where() when selecting columns in 'dplyr'?

For example, for:

library(tidyverse)

set.seed(666)

DF <- data.frame(V1 = as.factor(rep(c("A", "B", 1, 2))),
                 V2 = runif(8, 0, 1),
                 V3 = runif(8, 0, 2))

... this works:

DF %>% select(where(is.numeric)) %>% select(where(~max(., na.rm = TRUE) > 1))

but, this:

DF %>% select(where(is.numeric) & where(~max(., na.rm = TRUE) > 1))

yields this error:

Error in `select()`:
! ‘max’ not meaningful for factors

and similarly, this:

DF %>% select(where(is.numeric & ~max(., na.rm = TRUE) > 1))

also gives an error

Error in `select()`:
! operations are possible only for numeric, logical or complex types

making it seems like is.numeric is not being evaluated.

For selection, piping is okay, but ultimately I need to select only numeric variables with max > 1 inside of a mutate(across()), so I need to select for both conditions simultaneously.

if you are going to use . placeholder with ~ for max() then I think that forces you to use the same for is.numeric if you want both in the same where clause. Also I use && which uses shortcutting, i.e. any non numeric (like a factor) will fail and not be tested for its max

DF %>% select(where(~{is.numeric(.) && 
    max(., na.rm = TRUE) > 1}))
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.