Eval vs ! ! (bang bang) in functions using dplyr

rlang

#1

I am trying to wrap my head around why in one function a dplyr command will work with !! on one line but a dplyr function in another line will need eval().

I have a dataset with one column of UTC time and one of PT time and I want users to be able to pick which timezone their visualization is in.

Setup toy dataset and helper function:

library(tibble)
library(dplyr)

repro_data <- tribble(
    ~created_at, ~created_at_pt,
    "2017-12-31 00:33:11", "2017-12-31 00:33:11",
    "2018-01-01 00:03:57", "2017-12-31 16:03:57",
    "2018-01-01 14:40:18", "2018-01-01 14:40:18") %>%
    mutate(created_at    = as.POSIXct(created_at),
           created_at_pt = as.POSIXct(created_at_pt))

set_timezone <- function(timezone_str){
    if(timezone_str == 'pt'){
        tz <- quo(created_at_pt)
    }else{
        tz <- quo(created_at)
    }
    tz
}

I have been going over the Advanced R section on this but something isn’t clicking for me in understanding why !! works in the mutate but not the filter:

broken_fun <- function(timezone) {
    report_tz <- set_timezone(timezone)
    
    repro_data %>%
        filter(!!report_tz >= lubridate::ymd('2018-01-01')) %>%
        mutate(bought_day = lubridate::floor_date(!!report_tz, 'day'))
}

broken_fun('pt')
broken_fun('utc')
# Both return empty dataframes
# I think because it's comparing a date object to a... something else 

However, by using eval in the filter it works as expected:

working_fun <- function(timezone) {
    report_tz <- set_timezone(timezone)
    
    repro_data %>%
        filter(eval(report_tz) >= lubridate::ymd('2018-01-01')) %>%
        mutate(bought_day = lubridate::floor_date(!!report_tz, 'day'))
}

working_fun('pt')
working_fun('utc')
# Both return expected results filtered with new column

Using eval for both also seems to work… although when I look at other people’s code it appears that !! is preferred by most people (at least in open source)?

working_fun2 <- function(timezone) {
    report_tz <- set_timezone(timezone)
    
    repro_data %>%
        filter(eval(report_tz) >= lubridate::ymd('2018-01-01')) %>%
        mutate(bought_day = lubridate::floor_date(eval(report_tz), 'day'))
}

working_fun2('pt')
working_fun2('utc')
# Both return expected results filtered with new column

So I no have it working which is nice, but I would love to understand why the initial way did not work since I am hoping to do a lot more work with user inputs. Thanks in advance!

Edit: Apparently multiple exclamation points are replaced with a single one in the title section of a new post.


#2

Try filter((!!report_tz) >= lubridate::ymd('2018-01-01')) with the unquoting inside of parentheses so that report_tz gets unquoted immediately. Right now, I think the !! are being treated as negations.

The rlang docs warn about this very issue:

# The !! operator is a handy syntactic shortcut for unquoting with
# UQ().  However you need to be a bit careful with operator
# precedence. All arithmetic and comparison operators bind more
# tightly than `!`:
quo(1 +  !! (1 + 2 + 3) + 10)
#> <quosure: local>
#> ~1 + 16

# For this reason you should always wrap the unquoted expression
# with parentheses when operators are involved:
quo(1 + (!! 1 + 2 + 3) + 10)
#> <quosure: local>
#> ~1 + (6) + 10

# Or you can use the explicit unquote function:
quo(1 + UQ(1 + 2 + 3) + 10)
#> <quosure: local>
#> ~1 + 6 + 10

#3

BRILLIANT!! With your suggested change, it works perfectly.

Thank you for the link to the documentation! It’s so hard to search for punctuation and I didn’t realize !! worked in place of UQ(). The examples in this documentation are exactly what I needed to understand these better :smile:


#4

It’s because the first !report_tz is being interpreted as negation rather than !. You can turn it into function calls with

`!`(`!`(report_tz))

here is the reprex… are the results as expected?

library(tibble)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
repro_data <- tribble(
    ~created_at, ~created_at_pt,
    "2017-12-31 00:33:11", "2017-12-31 00:33:11",
    "2018-01-01 00:03:57", "2017-12-31 16:03:57",
    "2018-01-01 14:40:18", "2018-01-01 14:40:18") %>%
    mutate(created_at    = as.POSIXct(created_at),
                 created_at_pt = as.POSIXct(created_at_pt))

set_timezone <- function(timezone_str){
    if(timezone_str == 'pt'){
        tz <- quo(created_at_pt)
    }else{
        tz <- quo(created_at)
    }
    tz
}

f1 <- function(timezone) {
    set_timezone(timezone)  
}

broken_fun <- function(timezone) {
    report_tz <- set_timezone(timezone)

    repro_data %>%
        filter(`!`(`!`(report_tz)) >= lubridate::ymd('2018-01-01')) %>%
        mutate(bought_day = lubridate::floor_date(!!report_tz, 'day'))
}

broken_fun('pt')
#> # A tibble: 1 x 3
#>            created_at       created_at_pt bought_day
#>                <dttm>              <dttm>     <dttm>
#> 1 2018-01-01 14:40:18 2018-01-01 14:40:18 2018-01-01
broken_fun('utc')
#> # A tibble: 2 x 3
#>            created_at       created_at_pt bought_day
#>                <dttm>              <dttm>     <dttm>
#> 1 2018-01-01 00:03:57 2017-12-31 16:03:57 2018-01-01
#> 2 2018-01-01 14:40:18 2018-01-01 14:40:18 2018-01-01

#5

… It does produce the expected results… as well as completely blow my mind!

I’m going to have to study up on the order/way things are interpreted and the effects of backticks and parentheses, I can’t stop staring at this and trying different permutations of this.

So this also works:

`!!`(report_tz)

And you can print it this way:

explore_fun <- function(timezone) {
    report_tz <- set_timezone(timezone)
    
    print(`!!`(report_tz))
    
    repro_data %>%
        filter(`!!`(report_tz) >= lubridate::ymd('2018-01-01')) %>%
        mutate(bought_day = lubridate::floor_date(!!report_tz, 'day'))
}

explore_fun('utc')

But the print part doesn’t work with the way you pointed out and throws an error implying that it’s trying to negate the quo again?!

explore_fun2 <- function(timezone) {
    report_tz <- set_timezone(timezone)
    
    print(`!`(`!`(report_tz)))
    
    repro_data %>%
        filter(`!!`(report_tz) >= lubridate::ymd('2018-01-01')) %>%
        mutate(bought_day = lubridate::floor_date(!!report_tz, 'day'))
}

explore_fun2('utc')

This is so fascinating!!! Do you have any resources you recommend for reading up on why these work the way they do?

Thank you for this answer, this is super motivating for kickstarting my ‘understand R better’ resolution.


#6

I don’t think there is anything really complete or fully definitive on quosures… that’s what you are dealing with here… because is just to new and in a bit of flux, but:

The first source go to is the part @hadley 's new version of Advanced R that discuses quosures

I’ve also written a tutorial too

This covers some of how they they work but is by no means complete.


#7

We’ve fixed this problem in the dev version of rlang (which powers tidy evaluation). In the future this will work as expected:

repro_data %>%
  filter(!!var >= lubridate::ymd("2018-01-01")) %>%
  mutate(bought_day = lubridate::floor_date(!!var, "day"))