Scoped filtering over column range with & logic possible?

paul · February 1, 2019, 12:44pm

I'm trying to find a method of filtering over a specific range of columns in a data frame to filter to rows that contain all elements of a character vector somewhere within that range of columns.

For example, if my column range was the one below:

library(dplyr)

set.seed(10)

df <- tibble(
  a = sample(LETTERS[1:10], 10),
  b = sample(LETTERS[1:10], 10),
  c = sample(LETTERS[1:10], 10),
  d = sample(LETTERS[1:10], 10),
  e = sample(LETTERS[1:10], 10)
)

df
#> # A tibble: 10 x 5
#>    a     b     c     d     e    
#>    <chr> <chr> <chr> <chr> <chr>
#>  1 F     G     I     F     C    
#>  2 C     F     F     A     J    
#>  3 D     A     G     B     A    
#>  4 E     E     C     G     F    
#>  5 A     C     H     C     B    
#>  6 B     I     D     D     H    
#>  7 G     H     E     E     E    
#>  8 J     D     A     J     G    
#>  9 H     J     B     H     D    
#> 10 I     B     J     I     I

and I wanted to filter to rows that had at least one instance of "A" and at least one instance of "C" I could use filter_at twice to get the desired result:

df %>% 
  filter_at(vars(a:e), any_vars(. == "A")) %>% 
  filter_at(vars(a:e), any_vars(. == "C"))
#> # A tibble: 2 x 5
#>   a     b     c     d     e    
#>   <chr> <chr> <chr> <chr> <chr>
#> 1 C     F     F     A     J    
#> 2 A     C     H     C     B

However , I'm trying to implement this in shiny where filter criteria comes as a vector c("A", "C") from a select input so is there a way I can do this with a single filter function using the vector?

Using %in% won't work because it returns any rows with "A" or "C"

df %>% 
  filter_at(vars(a:e), any_vars(. %in% c("A", "C")))
#> # A tibble: 6 x 5
#>   a     b     c     d     e    
#>   <chr> <chr> <chr> <chr> <chr>
#> 1 F     G     I     F     C    
#> 2 C     F     F     A     J    
#> 3 D     A     G     B     A    
#> 4 E     E     C     G     F    
#> 5 A     C     H     C     B    
#> 6 J     D     A     J     G

Thanks!

jim89 · February 1, 2019, 2:09pm

I think this will work:

inputs <- c("A", "C")
df %>% 
    mutate(rn = row_number()) %>% 
    gather(key, values, -rn) %>% 
    group_by(rn) %>% 
    mutate(all = all(inputs %in% values)) %>% 
    spread(key, values)

Rather then checking for the columns being %in% the input, we can turn the data long (with gather()) so that the columns become rows, then check if all the inputs are %in% the values for that row. The row_number() adds a unique ID to each row so you can spread() the data back to your wide form afterwords. Example output below, you should be able to use the all field to filter to just the rows you want to keep.

# A tibble: 10 x 7
# Groups:   rn [10]
      rn all   a     b     c     d     e    
   <int> <lgl> <chr> <chr> <chr> <chr> <chr>
 1     1 FALSE F     G     I     F     C    
 2     2 TRUE  C     F     F     A     J    
 3     3 FALSE D     A     G     B     A    
 4     4 FALSE E     E     C     G     F    
 5     5 TRUE  A     C     H     C     B    
 6     6 FALSE B     I     D     D     H    
 7     7 FALSE G     H     E     E     E    
 8     8 FALSE J     D     A     J     G    
 9     9 FALSE H     J     B     H     D    
10    10 FALSE I     B     J     I     I

system · February 22, 2019, 2:09pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.