I'm trying to filter a dataset to get only a specific regex across multiple columns (an address that could be in 6 to 10 columns). I'm using filter_at
which solves the problem, but, in dplyr documentation (filter_all) it says filter_at
is superseded, but I don't understand how I'm supposed to use the combination of filter
+ across
.
library(tidyverse)
df <- tribble(
~id, ~name, ~dir, ~something, ~dir_asd,
1, "a", "address", 15, "some other",
2, "b", "someplace", 10, "address",
3, "c", "localhost", 2, "::1"
)
filter_at(df, vars(starts_with("dir")), any_vars(str_detect(., "address") == TRUE))
#> # A tibble: 2 x 5
#> id name dir something dir_asd
#> <dbl> <chr> <chr> <dbl> <chr>
#> 1 1 a address 15 some other
#> 2 2 b someplace 10 address
filter(df, across(starts_with("dir"), ~ str_detect(.x, "address") == TRUE))
#> # A tibble: 0 x 5
#> # ... with 5 variables: id <dbl>, name <chr>, dir <chr>, something <dbl>,
#> # dir_asd <chr>
filter(df, across(dir, ~ str_detect(.x, "address") == TRUE))
#> # A tibble: 1 x 5
#> id name dir something dir_asd
#> <dbl> <chr> <chr> <dbl> <chr>
#> 1 1 a address 15 some other
Created on 2020-06-23 by the reprex package (v0.3.0)
The last case partially works, which lead me to thing that the selection is working starts_with
, but it's trying to match every column instead any. Adding any_vars
to the mix, doesn't work:
filter(df, across(starts_with("dir"), ~ any_vars(str_detect(.x, "address") == TRUE)))
#> Error: Input must be a vector, not a `any_vars/quosure/formula` object.
#> Run `rlang::last_error()` to see where the error occurred.
Looking into across
+ any_vars
, lead me here: https://github.com/tidyverse/dplyr/issues/4770, which ended in implementing c_across()
, but I don't understand how it could be helpful here.
If you could point me in the right direction I would be very grateful.