How to filter "programmatically"

Andrea · February 24, 2020, 1:09pm

I often have to handle complex filters in my code. Example:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)

test <- structure(list(type = c("Clar", "Tech", "Clar", "Clar", "Clar", 
"Clar", "Tech", "Tech", "Tech", "Clar", "Clar", "Clar", "Tech", 
"Tech", "Tech", "Clar", "Clar", "Clar", "Tech", "Tech", "Clar"
), class = c("A", "B", "A", "A", "A", "A", "B", "B", "B", "A", 
"A", "A", "B", "B", "B", "A", "A", "A", "B", "B", "A"), owner = c("team A1", 
"team A1 NY", "team A3", NA, "team A2", NA, "team A1", "team A1", 
"team A1 SF", "team A1", "team A1", "team A2", "team A1", "team A1", 
"team A1 NY", NA, "team A1 SF", "team A1", "team A1", "team A1", 
"team A3"), answer = structure(c(NA, 1L, NA, NA, NA, NA, 1L, 
1L, 1L, NA, NA, NA, 1L, 1L, 1L, NA, NA, NA, 1L, 1L, NA), .Label = c("Accept", 
"Change Solution"), class = "factor"), complexity = structure(c(2L, 
NA, 2L, 2L, 2L, 2L, NA, NA, NA, 2L, 2L, 2L, NA, NA, NA, 2L, 2L, 
2L, NA, NA, NA), .Label = c("High", "Low", "Medium"), class = "factor")), row.names = c(NA, 
21L), class = "data.frame")

test_filtered_1 <- test %>% 
  filter(type == "Clar" & 
            complexity == "Low")

test_filtered_1
#>    type class      owner answer complexity
#> 1  Clar     A    team A1   <NA>        Low
#> 2  Clar     A    team A3   <NA>        Low
#> 3  Clar     A       <NA>   <NA>        Low
#> 4  Clar     A    team A2   <NA>        Low
#> 5  Clar     A       <NA>   <NA>        Low
#> 6  Clar     A    team A1   <NA>        Low
#> 7  Clar     A    team A1   <NA>        Low
#> 8  Clar     A    team A2   <NA>        Low
#> 9  Clar     A       <NA>   <NA>        Low
#> 10 Clar     A team A1 SF   <NA>        Low
#> 11 Clar     A    team A1   <NA>        Low

test_filtered_2 <- test %>%
  filter(type == "Tech" &
           class == "B" &
           str_detect(owner, "A1") &
           answer == "Accept")

test_filtered_2
#>   type class      owner answer complexity
#> 1 Tech     B team A1 NY Accept       <NA>
#> 2 Tech     B    team A1 Accept       <NA>
#> 3 Tech     B    team A1 Accept       <NA>
#> 4 Tech     B team A1 SF Accept       <NA>
#> 5 Tech     B    team A1 Accept       <NA>
#> 6 Tech     B    team A1 Accept       <NA>
#> 7 Tech     B team A1 NY Accept       <NA>
#> 8 Tech     B    team A1 Accept       <NA>
#> 9 Tech     B    team A1 Accept       <NA>

test_filtered_3 <- test %>%
  filter((type == "Clar" & 
            complexity == "Low") |
           (class == "B" &
              str_detect(owner, "A1") &
              answer == "Accept"))

test_filtered_3
#>    type class      owner answer complexity
#> 1  Clar     A    team A1   <NA>        Low
#> 2  Tech     B team A1 NY Accept       <NA>
#> 3  Clar     A    team A3   <NA>        Low
#> 4  Clar     A       <NA>   <NA>        Low
#> 5  Clar     A    team A2   <NA>        Low
#> 6  Clar     A       <NA>   <NA>        Low
#> 7  Tech     B    team A1 Accept       <NA>
#> 8  Tech     B    team A1 Accept       <NA>
#> 9  Tech     B team A1 SF Accept       <NA>
#> 10 Clar     A    team A1   <NA>        Low
#> 11 Clar     A    team A1   <NA>        Low
#> 12 Clar     A    team A2   <NA>        Low
#> 13 Tech     B    team A1 Accept       <NA>
#> 14 Tech     B    team A1 Accept       <NA>
#> 15 Tech     B team A1 NY Accept       <NA>
#> 16 Clar     A       <NA>   <NA>        Low
#> 17 Clar     A team A1 SF   <NA>        Low
#> 18 Clar     A    team A1   <NA>        Low
#> 19 Tech     B    team A1 Accept       <NA>
#> 20 Tech     B    team A1 Accept       <NA>

^{Created on 2020-02-24 by the reprex package (v0.3.0)}

Rather than having to copy the filter code in two different parts of my script, is there a way to programmatically define filters, and then pass them to filter? I know that in this specific case, joining test_filtered_1 and test_filtered_2 would return the same dataframe as test_filtered_3, even if the filters are not the same. But what I'm looking for is something (not run, because it would give an error):

filter_1 <- type == "Clar" & 
            complexity == "Low"
test_filtered_1 <- test %>% 
  filter(filter_1)

filter_2 <- type == "Tech" &
           class == "B" &
           str_detect(owner, "A1") &
           answer == "Accept"

test_filtered_2 <- test %>%
  filter(filter_2)

filter_3 <- (type == "Clar" & 
            complexity == "Low") |
           (class == "B" &
              str_detect(owner, "A1") &
              answer == "Accept")

test_filtered_3 <- test %>%
  filter(filter_3)

I guess this should be doable with functions?

nirgrahamuk · February 24, 2020, 1:41pm

you can use expression() to capture your conditions, then eval()-uate them at point of use.
When combining a higher order expression, I found it works to nest eval()

filter_1 <- expression(type == "Clar" & 
  complexity == "Low")
test_filtered_1 <- test %>% 
  filter(eval(filter_1))

filter_2 <-expression( type == "Tech" &
  class == "B" &
  str_detect(owner, "A1") &
  answer == "Accept")

test_filtered_2 <- test %>%
  filter(eval(filter_2))

filter_3 <- expression(eval(filter_1) |  eval(filter_2))

test_filtered_3 <- test %>%
  filter(eval(filter_3))

joels · February 24, 2020, 4:47pm

You can use rlang::exprs to capture your filter expressions in a list and then map over them. In the example below, I've created a named list of filter expressions so that the output will be named as well, but it's not necessary to add names if you don't need them.

I don't really know if this is the "right" way to encode filters for tidy eval and would be interested in hearing from the experts about whether there are more appropriate ways to capture filters and whether this method should be used for writing functions that use dynamic filters in dplyr pipes.

library(rlang)
library(tidyverse)

filters = exprs(clar.low = type == "Clar" & 
                  complexity == "Low",
                tech.b.accept = type == "Tech" &
                  class == "B" &
                  str_detect(owner, "A1") &
                  answer == "Accept",
                clar.low.or.B.accept = (type == "Clar" & 
                   complexity == "Low") |
                  (class == "B" &
                     str_detect(owner, "A1") &
                     answer == "Accept"))

filters
# $clar.low
# type == "Clar" & complexity == "Low"
# 
# $tech.b.accept
# type == "Tech" & class == "B" & str_detect(owner, "A1") & answer == "Accept"
# 
# $clar.low.or.B.accept
# (type == "Clar" & complexity == "Low") | (class == "B" & str_detect(owner, "A1") & answer == "Accept")

map(filters, ~test %>% filter(!!.x))

$clar.low
   type class      owner answer complexity
1  Clar     A    team A1   <NA>        Low
2  Clar     A    team A3   <NA>        Low
3  Clar     A       <NA>   <NA>        Low
4  Clar     A    team A2   <NA>        Low
5  Clar     A       <NA>   <NA>        Low
6  Clar     A    team A1   <NA>        Low
7  Clar     A    team A1   <NA>        Low
8  Clar     A    team A2   <NA>        Low
9  Clar     A       <NA>   <NA>        Low
10 Clar     A team A1 SF   <NA>        Low
11 Clar     A    team A1   <NA>        Low

$tech.b.accept
  type class      owner answer complexity
1 Tech     B team A1 NY Accept       <NA>
2 Tech     B    team A1 Accept       <NA>
3 Tech     B    team A1 Accept       <NA>
4 Tech     B team A1 SF Accept       <NA>
5 Tech     B    team A1 Accept       <NA>
6 Tech     B    team A1 Accept       <NA>
7 Tech     B team A1 NY Accept       <NA>
8 Tech     B    team A1 Accept       <NA>
9 Tech     B    team A1 Accept       <NA>

$clar.low.or.B.accept
   type class      owner answer complexity
1  Clar     A    team A1   <NA>        Low
2  Tech     B team A1 NY Accept       <NA>
3  Clar     A    team A3   <NA>        Low
4  Clar     A       <NA>   <NA>        Low
5  Clar     A    team A2   <NA>        Low
6  Clar     A       <NA>   <NA>        Low
7  Tech     B    team A1 Accept       <NA>
8  Tech     B    team A1 Accept       <NA>
9  Tech     B team A1 SF Accept       <NA>
10 Clar     A    team A1   <NA>        Low
11 Clar     A    team A1   <NA>        Low
12 Clar     A    team A2   <NA>        Low
13 Tech     B    team A1 Accept       <NA>
14 Tech     B    team A1 Accept       <NA>
15 Tech     B team A1 NY Accept       <NA>
16 Clar     A       <NA>   <NA>        Low
17 Clar     A team A1 SF   <NA>        Low
18 Clar     A    team A1   <NA>        Low
19 Tech     B    team A1 Accept       <NA>
20 Tech     B    team A1 Accept       <NA>

woodward · February 24, 2020, 8:38pm

Why don't you just create a new column in your dataframe that contains the result of the logical equation. Then you can filter on that column. It would seem much clearer to me.

df %>%
  mutate(filter1 = ...complex logic...) %>%
  filter(filter1)

joels · February 24, 2020, 10:40pm

I'm not sure it makes sense to mutate, as you can just do:

df %>% filter(...complex logic...)

I think the extra overhead here is to be able to apply different filters dynamically and perhaps in different combinations.

woodward · February 25, 2020, 12:13am

Oh I see. I would probably create a function

my_filter1 <- function(df, other_arguments){
   df %>%
      filter(...complex filter...)
}

joels · February 25, 2020, 12:16am

I agree. To make the function flexible, you'd want to be able to pass the filter condition(s) as an argument, which (as far as I know) requires quoting and unquoting.

Andrea · February 25, 2020, 1:57pm

Hi Nir,

thanks a lot for the answer! This is what I'm using right now, but I'd like to investigate @joels a little bit before taking a final decision. Anyway, very clean solution - I completely forgot about base expression.

nirgrahamuk · February 25, 2020, 2:03pm

thanks, I like joels also because its a shorter syntax (im a lazy typer)
I actually use some dynamic filtering in one of my real life projects. I knew a lot less about R then than I do now, but I don't see a reason to go back and change my approach...
I basically captured sub filter expressions that the user could make, and handled them as simple strings, which I could programatticaly paste together (with or or and seperators as needed) and then I simply 'eval'ed the final string in a filter object.

Andrea · February 25, 2020, 2:11pm

Hi Joel,

thanks! I like solutions which use tidy eval, but I don't like the fact you have to "wrap" the whole pipe in map:

map(filters, ~test %>% filter(!!.x))

This seems to limit its "composability" with a longer pipelines. I cannot provide the exact details, but my actual use case is something like

data %<>% 
  janitor::clean_names() %>%
  na_if("") %>% 
  mutate_at(vars(contains(c(...))), lubridate::ymd_hms) %>%
  mutate_at(vars(contains(c(...))), as.factor) %>%
  filter(eval(combined_filter))

How would I apply your solution to my use case, or more in general how would I compose it with other pipes?

Aurele · February 25, 2020, 3:47pm

When it comes to eval, I think that a word of caution is in order.

There might be the temptation to let the user specify the filters, say in a Shiny app, like so:

library(shiny)

ui <- tagList(
  textInput("filter", "Filter expression"),
  tableOutput("res")
)

server <- function(input, output) {
  df <- head(mtcars)
  
  output$res <- renderTable({
    tryCatch(
      filter(df, eval(parse(text = input$filter))),
      error = function(e) df
    )
  })
}

shinyApp(ui, server)

Now a harmless user might type in cyl == 6 and it will be fine.

But a less well-intentioned user might type in

insertUI(selector = "#res", ui = tagList(textInput("console", "Console"), verbatimTextOutput("out"))) ; output$out <- renderPrint(eval(parse(text = input$console))) ; TRUE

and execute arbitrary code.

joels · February 25, 2020, 5:03pm

Andrea,

Just to make sure that I understand the exact use case, does combined_filter put together the three filters with AND or OR? OR seems to make more sense in the example, but I wanted to be sure.

Andrea · February 26, 2020, 7:49am

Good to know! Won't be an issue in my use case since the code is never going to run on a server, but it could have been an issue otherwise.

Andrea · February 26, 2020, 7:50am

OR. Thanks for the support!

Andrea · March 18, 2020, 8:00am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.