select_if with multiple conditions - redux

Hello there,

Here is a variant of a problem I thought we were able to solve previously.
Consider this


testdf <- data_frame(one = c(1,1,1),
                     two = c(1,2,3),
                     three = c(3,3,3))

How can I select the columns that

  • have non-zero variance OR
  • whose name is 'three'

I tried this without success

> testdf %>% select_if(funs(var(.) != 0 | grepl("three", names(.))))
> Error in selected[[i]] <- .p(.tbl[[tibble_vars[[i]]]], ...) : 
> replacement has length zero

What is the issue here?
Thanks!

I haven't figured out how to make select_if work, but here's an option using select (adapted from this answer on StackOverflow):

testdf %>% 
  select(which(map_lgl(., ~ var(.x) != 0)), three)
    two three
  <dbl> <dbl>
1     1     3
2     2     3
3     3     3
4 Likes

interesting. but why do we need the which here?

select is expecting column positions or names, rather than a vector of logical values. which returns the column positions that are TRUE.

2 Likes

very nice, thanks! i wonder if we actually hit a limitation of select_at or select_if here. These seem to be the natural candidates for this kind of task, yet as you said, this is not easy to do

select_if doesn't seem to have access to the column names once the condition is wrapped inside funs, but the funs wrapper is necessary for the variance condition to work. I'm not sure how to resolve this, but here is an illustration of the issue:

library(tidyverse)

testdf <- data_frame(one = c(1,1,1),
                     two = c(1,2,3),
                     three = c(3,3,3))

testdf %>% 
  select_if(names(.)=="three")
#> # A tibble: 3 x 1
#>   three
#>   <dbl>
#> 1     3
#> 2     3
#> 3     3

testdf %>% 
  select_if(funs(names(.)=="three"))
#> Error in selected[[i]] <- .p(.tbl[[tibble_vars[[i]]]], ...): replacement has length zero

testdf %>% 
  select_if(var(.) != 0)
#> Error in tbl_if_vars(.tbl, .predicate, caller_env(), .include_group_vars = TRUE): length(.p) == length(tibble_vars) is not TRUE

testdf %>% 
  select_if(funs(var(.) != 0))
#> # A tibble: 3 x 1
#>     two
#>   <dbl>
#> 1     1
#> 2     2
#> 3     3

Created on 2018-11-26 by the reprex package (v0.2.1)

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.