How to filter multiple variables by multiple conditions

I understand how I can filter my data by one or by many conditions. How can I apply the same conditional filter to multiple variables? If I have just one or two variables it is fine to copy and paste, but in my real data I have 25 variables I need to apply my filter to.

library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.1.2
#> Warning: package 'readr' was built under R version 4.1.2

# target criteria
target <- c("b", "d") %>% paste(collapse = "|")

target
#> [1] "b|d"

# my data 
my_data <- tribble(
  ~col1, ~col2, ~col3, ~col4, ~col5,
  "aaa", "aaa", "aaa", "aaa", "eee",
  "aaa", "bbb", "bbb", "bbb", "eee",
  "bbb", "ccc", "ccc", "ccc", "eee",
  "aaa", "ddd", "ddd", "ddd", "eee",
  "aaa", "eee", "eee", "eee", "ddd",
  "aaa", "fff", "fff", "fff", "eee"
)

# filter becomes vulerable to copy and paste errors if I have many variables
my_filtered_data <- my_data %>%
  filter(str_detect(col1, target) |
           str_detect(col3, target) |
           str_detect(col5, target))

my_filtered_data
#> # A tibble: 4 x 5
#>   col1  col2  col3  col4  col5 
#>   <chr> <chr> <chr> <chr> <chr>
#> 1 aaa   bbb   bbb   bbb   eee  
#> 2 bbb   ccc   ccc   ccc   eee  
#> 3 aaa   ddd   ddd   ddd   eee  
#> 4 aaa   eee   eee   eee   ddd

Created on 2021-12-28 by the reprex package (v2.0.1)

I'd use this approach (note I added an extra line to your example to demo the AND example):

library(tidyverse)

# target criteria
target <- c("b", "d") %>% paste(collapse = "|")

target
#> [1] "b|d"

# my data 
my_data <- tribble(
  ~col1, ~col2, ~col3, ~col4, ~col5,
  "aaa", "aaa", "aaa", "aaa", "eee",
  "aaa", "bbb", "bbb", "bbb", "eee",
  "bbb", "ccc", "ccc", "ccc", "eee",
  "aaa", "ddd", "ddd", "ddd", "eee",
  "aaa", "eee", "eee", "eee", "ddd",
  "aaa", "fff", "fff", "fff", "eee",
  "bbb", "bbb", "bbb", "bbb", "bbb"
)

# OR
my_data |> 
  filter(if_any(contains("col"), ~str_detect(.x, target)))
#> # A tibble: 5 x 5
#>   col1  col2  col3  col4  col5 
#>   <chr> <chr> <chr> <chr> <chr>
#> 1 aaa   bbb   bbb   bbb   eee  
#> 2 bbb   ccc   ccc   ccc   eee  
#> 3 aaa   ddd   ddd   ddd   eee  
#> 4 aaa   eee   eee   eee   ddd  
#> 5 bbb   bbb   bbb   bbb   bbb

# AND
my_data |> 
  filter(across(contains("col"), ~str_detect(.x, target)))
#> # A tibble: 1 x 5
#>   col1  col2  col3  col4  col5 
#>   <chr> <chr> <chr> <chr> <chr>
#> 1 bbb   bbb   bbb   bbb   bbb

Created on 2021-12-28 by the reprex package (v2.0.1)

1 Like

Yes, that works. Thank you.

  • What is the difference between |> and %>%?
  • Within the filter function I see if_any is what selects the columns.
  • Is str_detect the purrr-style lambda; the function to apply to each selected columns? Why use ~?
  • What exactly is the .x? It seems like an indicator of where to apply the function.

They are almost identical; |> is the base R version, %>% is the magrittr version.

That's right; if_any() checks to see if any of the columns specified meet a condition. I use contains("col"), so in full it is saything "if any of the columns that have "col" in the name..."

str_detect() is from stringr, and checks if a string contains a substring. We do apply it to each of the selected columns. ~ is just one of those things that you need to know to do if you want to provide an argument to the function that you want to use, much like in purrr.

That is exactly what it is doing, much like in purrr.

In full, the statement is saying "Take my_data and return the rows in which any of the columns named with the substring "col" contain either the letter b or d".

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.