Dropping with select using quosures

rlang

#1

I’m not sure if this is a bug or I’m doing something wrong, but when I try to drop variables contained in a quosures object via select, the behavior is different depending on whether there are one or more variables in the quosures object.

select_not <- function(d, ...) {
  to_drop <- rlang::quos(...)
  dplyr::select(d, -!!!to_drop)
}
dd <- data.frame(x = 1:10, y = 11:20, z = 21:30)
# Good: returns x and y
select_not(dd, z)
# Not good: Returns y and z
select_not(dd, y, z)
> devtools::session_info()
Session info ---------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.3 (2017-11-30)
 system   x86_64, darwin15.6.0        
 ui       RStudio (1.1.383)           
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/Denver              
 date     2018-01-03                  

Packages -------------------------------------------------------------------------------------------------------------------------
 package    * version date       source         
 assertthat   0.2.0   2017-04-11 CRAN (R 3.4.0) 
 base       * 3.4.3   2017-12-07 local          
 bindr        0.1     2016-11-13 CRAN (R 3.4.0) 
 bindrcpp     0.2     2017-06-17 CRAN (R 3.4.0) 
 compiler     3.4.3   2017-12-07 local          
 datasets   * 3.4.3   2017-12-07 local          
 devtools     1.13.3  2017-08-02 CRAN (R 3.4.1) 
 digest       0.6.12  2017-01-27 CRAN (R 3.4.0) 
 dplyr        0.7.4   2017-09-28 CRAN (R 3.4.2) 
 glue         1.2.0   2017-10-29 cran (@1.2.0)  
 graphics   * 3.4.3   2017-12-07 local          
 grDevices  * 3.4.3   2017-12-07 local          
 magrittr     1.5     2014-11-22 CRAN (R 3.4.0) 
 memoise      1.1.0   2017-04-21 CRAN (R 3.4.0) 
 methods    * 3.4.3   2017-12-07 local          
 pkgconfig    2.0.1   2017-03-21 CRAN (R 3.4.0) 
 R6           2.2.2   2017-06-17 CRAN (R 3.4.0) 
 Rcpp         0.12.14 2017-11-23 cran (@0.12.14)
 rlang        0.1.6   2017-12-21 CRAN (R 3.4.3) 
 stats      * 3.4.3   2017-12-07 local          
 tibble       1.3.4   2017-08-22 cran (@1.3.4)  
 tools        3.4.3   2017-12-07 local          
 utils      * 3.4.3   2017-12-07 local          
 withr        2.1.0   2017-11-01 cran (@2.1.0)  
 yaml         2.1.14  2016-11-12 CRAN (R 3.4.0) 

#2

You were just missing a c()

select_not <- function(d, ...) {
  to_drop <- rlang::quos(...)
  dplyr::select(d, -!!!to_drop)
}
dd <- data.frame(x = 1:10, y = 11:20, z = 21:30)

select_not(dd, z)
#>     x  y
#> 1   1 11
#> 2   2 12
#> 3   3 13
#> 4   4 14
#> 5   5 15
#> 6   6 16
#> 7   7 17
#> 8   8 18
#> 9   9 19
#> 10 10 20

select_not(dd, c(y, z))
#>     x
#> 1   1
#> 2   2
#> 3   3
#> 4   4
#> 5   5
#> 6   6
#> 7   7
#> 8   8
#> 9   9
#> 10 10

Created on 2018-01-03 by the reprex package (v0.1.1.9000).

Same behaviour as select() in dplyr.


#3

Thanks @mara! How can I “flatten” the quosures object inside the function, to get the same result but with the variables passed in separately through … ?


#4

select has a handy tool for dealing with these situations where you can negate the one_of result. However, this requires getting the variables names into a character string.

After to_drop <- rlang::quos(...), getting the character strings is easy enough with as.character, but you have an extraneous ~ character. A quick sub can drop it, and you use

select_not <- function(d, ...){
  to_drop <- rlang::quos(...)
  to_drop <- sub("^~", "", as.character(to_drop))
  dplyr::select(d, -dplyr::one_of(to_drop))
}

Personally, I find the following maintains the look and feel of a tidyverse function, but gets to the heart of the issue more efficiently.

select_not <- function(d, ...){
  # get expressions in ... as characters
  to_drop <- 
    vapply(substitute(list(...)),
           as.character,
           character(1))[-1] # the first one ends up being "list", and we don't need it
  d[!names(d) %in% to_drop]
}

#5

Hi @nutterb, There’s also rlang::quo_name for that, it just seems like you ought to be able to do this directly with the quosures.


#6

Based on @lionel’s answer to a related question on SO:

You can use rlang::lang, though I admit that the quosures created by that are a little weird (of the form -~z). It may end up a bit brittle because of that, I suspect.

suppressPackageStartupMessages(library(tidyverse))

select_not <- function(d, ...) {
  to_drop <- rlang::quos(...) %>%
    map(~ rlang::lang("-", .))

  dplyr::select(d, !!!to_drop)
}
dd <- data.frame(x = 1:4, y = 11:14, z = 21:24)

select_not(dd, z)
#>   x  y
#> 1 1 11
#> 2 2 12
#> 3 3 13
#> 4 4 14

select_not(dd, x, z)
#>    y
#> 1 11
#> 2 12
#> 3 13
#> 4 14

#7

That’s awesome – thanks @nick!

The lang helpfile is a lesson. I hope the dives into rlang and NSE make us collectively better R users and programmers and we’re not just learning a series of one-off tricks. I’ll admit it hasn’t come together for me yet, but I haven’t tried too hard and am still optimistic.


#8

Alternatively, @mara was a half-step away from a much better answer: use c() around the unspliced argument!

select_not <- function(d, ...) {
  to_drop <- rlang::quos(...)
  
  dplyr::select(d, -c(!!!to_drop))
}

I expect there may be some good resources available for learning rlang in the next year or so; the underlying code seems to be in enough flux at the moment that creating definitive documentation probably isn’t a hugely worthwhile endeavor (beyond using it at a surface level).


#9

Nice. That’s clean and intuitive.


#10

I thought so too, but I haven’t figured out how to get quo_name to work here.


#11

Can you post a reprex of what you’re trying with quo_name()?


#12

I’ll give you “better R users.” However, I think “better R programmers” is up for debate. I’m not opposed to people programming with tidy evaluation, but it isn’t all roses and rainbows. It has a performance cost. How much that performance cost affects your decisions depends on how you envision your work being used. If it gets used once, probably not a big deal. If it gets used in any kind of resampling, it can be a very big deal.

For instance, using the examples here, avoiding quosures altogether results in an execution time of about 200 microseconds. Using the quosures takes about 20,000 microseconds. If I needed to run this in a routine 10,000 times (say for a bootstrap procedure), that translates into 2 seconds with standard evaluation and over 3 minutes with quosures. There’s a pretty good thread on the subject here


#13

Apparently I just didn’t try long enough. :slight_smile:

This works

select_not <- function(d, ...){
  to_drop <- purrr::map_chr(rlang::quos(...), 
                            rlang::quo_name)
  
  dplyr::select(d, -dplyr::one_of(to_drop))
}

#14

If you’re question’s been answered, would you mind selecting the solution? (I believe it’s Nick’s). That way we: know that your problem’s solved; and someone in the future knows what the solution was (I like how discourse puts the solution right in the bottom of the question).

If you’re the OP, there will be a little check box :ballot_box_with_check:️ at the footer of replies to the thread. To select a solution, you just click on it.


#15

Hi all, this is how I would go about that function. I start with exprs to get the list of fields back, and then add the minus sign to each expression using expr and map to iterate through all of the dots arguments (fields):

library(rlang)
library(dplyr)
library(purrr)

select_not <- function(df, ...){
  fields <- exprs(...)
  fields <- map(fields, ~expr(- !!.x))

  df %>%
    select(!!!fields)
}

select_not(mtcars, wt, mpg, carb)


#16

@edgararuiz, that’s awesome. Any recommended reading besides the programming with dplyr vignette and the rlang helpfiles for getting one’s head around this stuff?


#17

Thanks. I know this chapter is still in-flight, but Hadley’s updated version of Advanced R may be the best place to go, specifically to the idea behind expr(- !!.x)), which is to concatenate variables and operators to create a new formula, is found in this section: https://github.com/hadley/adv-r/blob/master/Quotation.Rmd#generating-code


#18

Cheers! This is the same strategy I used when I wrote an answer for this question on Stack Overflow. If anyone wants an explanation of each step here, see my answer.