using `rlang` to specify joining variables in `dplyr::join` functions

Here is a minimal reprex for a custom function I am writing to perform to operations on a dataframe and then join the resulting dataframes.

set.seed(123)
library(tidyverse)

# function
foo <- function(data, x) {
  dplyr::inner_join(
  group_by_at(data, rlang::enquos(x)) %>% summarise(wt = mean(wt)),
  group_by_at(data, rlang::enquos(x)) %>% summarise(n = dplyr::n())
  )
}

# works
foo(mtcars, c(am, cyl))
#> Joining, by = c("am", "cyl")
#> # A tibble: 6 x 4
#> # Groups:   am [2]
#>      am   cyl    wt     n
#>   <dbl> <dbl> <dbl> <int>
#> 1     0     4  2.94     3
#> 2     0     6  3.39     4
#> 3     0     8  4.10    12
#> 4     1     4  2.04     8
#> 5     1     6  2.76     3
#> 6     1     8  3.37     2

Although this works, it prints out the Joining, by = c("am", "cyl") message which I don't want.

So I tried to explicitly specify the joining variables and tried using rlang::as_string, rlang::quo_name, etc., but couldn't get it to work.

How can I make this work?

# to avoid the joining message, be explicit
foo <- function(data, x) {
  dplyr::inner_join(
    group_by_at(data, rlang::enquos(x)) %>% summarise(wt = mean(wt)),
    group_by_at(data, rlang::enquos(x)) %>% summarise(n = dplyr::n()),
    by = rlang::as_string(x)
  )
}

# doesn't work
foo(mtcars, c(am, cyl))
#> Error in is_string(x): object 'am' not found

Created on 2020-01-05 by the reprex package (v0.3.0.9001)

1 Like

I would change a bit your function and do it one of this two way

set.seed(123)
library(dplyr, warn.conflicts = FALSE)

# Using a character vector as input
foo <- function(data, x) {
  inner_join(
    group_by_at(data, x) %>% summarise(wt = mean(wt)),
    group_by_at(data, x) %>% summarise(n = n()),
    by = x
  )
}

foo(mtcars, c("am", "cyl"))
#> # A tibble: 6 x 4
#> # Groups:   am [2]
#>      am   cyl    wt     n
#>   <dbl> <dbl> <dbl> <int>
#> 1     0     4  2.94     3
#> 2     0     6  3.39     4
#> 3     0     8  4.10    12
#> 4     1     4  2.04     8
#> 5     1     6  2.76     3
#> 6     1     8  3.37     2

# Using some NSE specificiation as input
foo2 <- function(data, ...) {
  dots <- enquos(...)
  inner_join(
    group_by(data, !!!dots) %>% summarise(wt = mean(wt)),
    group_by(data, !!!dots) %>% summarise(n = n()),
    by = purrr::map_chr(dots, rlang::as_label)
  )
}

# works
foo2(mtcars, am, cyl)
#> # A tibble: 6 x 4
#> # Groups:   am [2]
#>      am   cyl    wt     n
#>   <dbl> <dbl> <dbl> <int>
#> 1     0     4  2.94     3
#> 2     0     6  3.39     4
#> 3     0     8  4.10    12
#> 4     1     4  2.04     8
#> 5     1     6  2.76     3
#> 6     1     8  3.37     2

Created on 2020-01-05 by the reprex package (v0.3.0.9001)

Hope it helps.

3 Likes

Hi! What does the "!!!" in the code?
Thank you
Regards

Oh sorry. This is call splicing when you have several quosures after enquos or ensyms.

When you use quo or sym, you can unquote with !! (bang bang). !!! is the same for several in quos. The operation is splicing.

See
https://dplyr.tidyverse.org/articles/programming.html#capturing-multiple-variables

And https://rlang.r-lib.org/reference/dyn-dots.html

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.