Developing a package, confused on how to use a new variable created by an embraced {{}} variable from function

I am new to package development and using Tidyverse within functions where I need to pass variable names using embrace into the functions. Part of my problem is that I don't fully understand why some of the things I am doing work. I have just been pragmatic and got something working and just moved on.

I understand some of the basics of how to use embraces as part of
dplyr. I understand how to create new variables using quotes and embraced variables. For example, if I wanted to add one to all the foo values in a new variable I could do mutate("{{foo}}_new" := {{foo}} + 1). I find myself in a function where I need to create a new variable and then depending on the data/conditions within each group, use different functions based on my newly created variable.

In other words, if I create a new column of {{foo}}_new with a mutate function, how do you use that new variable on the RHS of a mutate function? I have tried several combinations of !! things and trying to do enquo(). But honestly, I don't really understand what

In my reprex below, I have some basic x-y data and I am trying to create ordinal x-values by group. I have two functions, the first function ord_by_group shows the successful calculation of a new variable. The second function manipulate_new_variable duplicates the first parts of the previous function, but includes my attempt at working with the new variable on the RHS. I fully understand that the function fails because it is trying to subtract 1 from a string of "{{x_val}}_ord" .

Any help or advice on working with new variables with names based on embraced variables would be appreciated.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

func_dat  <- tibble(
  x_col = rep(c(1 / 7, 1, 2, 4, 26, 52),2),
  y_col = runif(12,0,100),
  grp = c(rep(1, 6), rep(2, 6)))

ord_by_group <- function(dat,x_val,...){

  #Can get function to grp based on ...
  dat = dat %>%
    group_by(...) %>%
    arrange(...)

  #Can create ordinals by group
  dat = dat %>%
    mutate("{{x_val}}_ord" := row_number())

  return(dat)

}

ord_by_group(func_dat,x_col,grp)
#> # A tibble: 12 x 4
#> # Groups:   grp [2]
#>     x_col y_col   grp x_col_ord
#>     <dbl> <dbl> <dbl>     <int>
#>  1  0.143  71.7     1         1
#>  2  1      79.2     1         2
#>  3  2      36.8     1         3
#>  4  4      26.8     1         4
#>  5 26      45.0     1         5
#>  6 52      16.8     1         6
#>  7  0.143  13.7     2         1
#>  8  1      74.7     2         2
#>  9  2      55.2     2         3
#> 10  4      66.9     2         4
#> 11 26      78.4     2         5
#> 12 52      99.8     2         6

manipulate_new_variable <- function(dat,x_val,...){
  
  #Can get function to grp based on ...
  dat = dat %>%
    group_by(...) %>%
    arrange(...)
  
  #Can create ordinals by group
  dat = dat %>%
    mutate("{{x_val}}_ord" := row_number())
  
  #How do you use the new variable for future calculations?
  dat = dat %>%
    mutate(new_var = "{{x_val}}_ord"-1)
  
  return(dat)
  
}

manipulate_new_variable(func_dat,x_col,grp)
#> Error: Problem with `mutate()` input `new_var`.
#> x non-numeric argument to binary operator
#> i Input `new_var` is `"{{x_val}}_ord" - 1`.
#> i The error occurred in group 1: grp = 1.

Created on 2020-10-20 by the reprex package (v0.3.0)

EDIT: I previously wrote LHS a bunch of times but I meant RHS of a mutate.

So I did actually figure out how to make it work by diving more into rlang but I am still not sure I understand the reasons why it works. I would still appreciate if someone could direct me to a resource that explains all this because having no formal training in programming I have no idea how I got this to work.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(rlang)

func_dat  <- tibble(
  x_col = rep(c(1 / 7, 1, 2, 4, 26, 52),2),
  y_col = runif(12,0,100),
  grp = c(rep(1, 6), rep(2, 6)))


manipulate_new_variable <- function(dat,x_val,...){

  #Can get function to grp based on ...
  dat = dat %>%
    group_by(...) %>%
    arrange(...)

  #Can create ordinals by group
  dat = dat %>%
    mutate("{{x_val}}_ord" := row_number())

#Subtract one from ordinal, just so it does something.

dat = dat %>%
  mutate(new_var = !!sym(paste0(as_name(quo({{x_val}})),"_ord"))-1)



  return(dat)
#Enquote, then as name, then catenate string, the convert back to symbol
#then !!
}

manipulate_new_variable(func_dat,x_col,grp)
#> # A tibble: 12 x 5
#> # Groups:   grp [2]
#>     x_col y_col   grp x_col_ord new_var
#>     <dbl> <dbl> <dbl>     <int>   <dbl>
#>  1  0.143  27.8     1         1       0
#>  2  1      87.2     1         2       1
#>  3  2      81.3     1         3       2
#>  4  4      67.3     1         4       3
#>  5 26      60.5     1         5       4
#>  6 52      15.2     1         6       5
#>  7  0.143  67.6     2         1       0
#>  8  1      32.8     2         2       1
#>  9  2      45.9     2         3       2
#> 10  4      93.1     2         4       3
#> 11 26      89.4     2         5       4
#> 12 52      52.2     2         6       5

Created on 2020-10-20 by the reprex package (v0.3.0)

You can use the .data pronoun to refer to column for which you have a name:

manipulate_new_variable <- function(dat, x_val, ...) {
  dat <- dat %>%
    group_by(...) %>%
    arrange(...)

  # Capture the user argument with `ensym()`, then transform it to a
  # string. `ensym()` forces the argument to be a simple name, not a
  # complex expression.
  new_var <- as_string(ensym(x_val))
  new_var <- glue("{new_var}_ord")

  # Use one `{` instead of two `{{` because we're using simple glue
  # interpolation of a string. The `{{` is for interpolating function
  # arguments.
  dat <- dat %>%
    mutate("{new_var}" := row_number())

  # Use the `.data` pronoun to
  dat <- dat %>%
    mutate(new_var = .data[[new_var]] - 1)

  dat
}

manipulate_new_variable(func_dat, x_col, grp)

Note that it is not recommended to create this kind of user interfaces. In this interface x_val is used to supply a new name for a column that does not exist yet. In data-masking interfaces, we normally supply unquoted arguments for columns that already exist. It would be more conventional if your interface worked like this:

manipulate_new_variable(func_dat, "x_col", grp)

I.e. take the new variable name as a string. To do this, just skip the as_string(ensym(arg)) part.

4 Likes

These resources are great, I very much appreciate them. I feel like the Advanced R book is way beyond my skill level but I will keep it in mind. I am not familiar with the Henry presentations, but they seem like exactly what I need in terms of teaching. Thanks again!.

1 Like

When it comes to the basics of tidy evaluation, I haven't come across a better resource than Lionel Henry's 2018 webinar introducing the tidyeval package.

I would recommend watching it multiple times (as I have) to really make sure you grasp the fundamentals. After that you can watch his RStudio::Conf 2020 lecture to learn about the updated toolkit such as {{ }} and .data.

@lionel Wanted to say a personal thank you for these lectures. They were invaluable in helping me wrap my head around tidy evaluation.

3 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

Hi @jefriedel,

the rlang issues concerning evaluation or quasi-quotation are e.g. covered in the book Advanced R
https://adv-r.hadley.nz/quasiquotation.html

A good start (less abstract) on "programming with dplyr", "tidy evaluation" and where the challenges are could be e.g. these two great talks by Lionel Henry (@lionel)
https://speakerdeck.com/lionelhenry/programming-in-the-tidyverse
https://speakerdeck.com/lionelhenry/reusing-tidyverse-code

2 Likes

Thanks! It seems like your suggestion will be easier to implement in the long term for my code.

"Advanced R" indeed goes a little beyond what you actually need for tidy evaluation. There are other presentations by Jenny Bryan (@jennybryan) which I just remember I enjoyed very much.

The main use cases encompass enquo(), !!, and :=, and if you don't use an "ancient" rlang package you could also use the "double embrace operation" {{...}} being equivalent to !!enquo(...)

And there are other tricks depending on what you would like to do (where you would not even need tidy eval). Lionel has shown the trick of "subsetting .data"

Have a look here, e.g.:

2 Likes