select_at with regex

I wanted to select variable that start with specific character. From select_at help:

select_at(mtcars, vars(-contains("ar"), starts_with("c")), toupper) %>% names()

[1] "MPG" "CYL" "DISP" "HP" "DRAT" "WT" "QSEC" "VS" "AM" "CARB"

I expected to see only CYL after dropping vars that contain "ar". But the outcome I get contains both CARB (contain "ar") and all other variables. Am I missing anything here

Running version dplyr_0.8.3

Yes, the order in which you select the variables

library(dplyr)

select_at(mtcars, vars(starts_with("c"), -contains("ar")), toupper) %>% names()
#> [1] "CYL"

Created on 2019-11-21 by the reprex package (v0.3.0.9000)

1 Like

I think this needs to be done in two stages:

select_at(mtcars, vars(starts_with("c")), toupper) %>% 
  select_at(vars(-contains("ar"))) %>% 
  names()

Your original selection just drops names containing "ar" (but adds "carb" back because it starts with "c".

EDIT: andresrcs has the better solution.

I was hoping this could be done with a single regular expression like the following:

mtcars %>% 
  select_at(vars(matches("^c(?!.*ar)")), toupper) %>% 
  names()

^c means the first character has to be c. Then (?!.*ar) is a negative lookahead that matches only strings that don't contain ar anywhere after the initial c.

However, the above code results in an "invalid regular expression" error.

If this were one of the grep functions, you could avoid this with a perl=TRUE argument, but that doesn't appear to be available in the standard tidyselect helpers (but see update below; it's now available in the development version). Based on this SO answer, here's a select helper that uses perl regular expressions:

matches2 <- function (match, ignore.case = TRUE, vars = current_vars()) {
  tidyselect:::grep_vars(match, vars, ignore.case=ignore.case, perl=TRUE)
}

Then we can do:

mtcars %>% 
  select_at(vars(matches2("^c(?!.*ar)")), toupper) %>% 
  names()

[1] "CYL"

Update: The development version of tidyselect has a new perl=TRUE argument for matches. With the development version, you can now do:

mtcars %>% 
  select_at(vars(matches("^c(?!.*ar)", perl=TRUE)), toupper) %>% 
  names()
1 Like

Thanks @andreasrcs - this works - order does matter on this case.

Thanks @martin.R. This answers the question, but I wanted to do it in one stage because the selection criteria is more than the simple example I have here.

I will select this as a solution, because I wanted to match a number of regex criteria to select the variable.

One thing I want to add is that I use the peek_var() instead of the current_var() because it is deprecated on the latest tidyselect.

matches2 <- function (match, ignore.case = TRUE, vars = tidyselect::peek_vars()) {
                      tidyselect:::grep_vars(match, vars, ignore.case=ignore.case, perl=TRUE)
                      }

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.