tidy eval with optional argument and multiple arguments and

Hi

I am looking for a way to have a function which does grouping with multiple variables and additionally one variable to have something like wt in count...

So far my solution is:

library(tidyverse)
set.seed(1234)

# base data set
df <- tibble(
  g1 = c("A", "B")[round(runif(100, 1, 2))],
  g2 = c("X", "Y")[round(runif(100, 1, 2))],
  v = runif(100, 0, 100)
)

# my function
grpsum <- function(data, sumVar, ...) {
  data %>% 
    group_by(...) %>% 
    summarise(total = sum({{sumVar}}))
}

df %>% grpsum(v, g1, g2)
#> # A tibble: 4 x 3
#> # Groups:   g1 [2]
#>   g1    g2    total
#>   <chr> <chr> <dbl>
#> 1 A     X     1176.
#> 2 A     Y     1422.
#> 3 B     X     1254.
#> 4 B     Y     1159.
df %>% grpsum(v, g1)
#> # A tibble: 2 x 2
#>   g1    total
#>   <chr> <dbl>
#> 1 A     2598.
#> 2 B     2413.
df %>% grpsum(v)
#> # A tibble: 1 x 1
#>   total
#>   <dbl>
#> 1 5012.

To improve this function I would like to make sumVar optional. If sumVar is not preset count values, here total would be 100.

I would also prefer to have sumVar as an explicit parameter and not somewhere hidden in the ....

Alternatively, is there another way pass multiple grouping variables without using ...?

Thank you for any ideas!
(I hope I haven't missed a topic where this is already answered - if so: sorry for that!)

Hi,

I'm very new myself to the TidyEval stuff, but after blundering about for a while I found this:

library(tidyverse)
library(rlang)
set.seed(1234)

# base data set
df <- tibble(
  g1 = c("A", "B")[round(runif(100, 1, 2))],
  g2 = c("X", "Y")[round(runif(100, 1, 2))],
  v = runif(100, 0, 100)
)

# my function
grpsum <- function(data, sumVar, ...) {
  
  sumVar = enquo(sumVar)
  
  if(quo_is_missing(sumVar)){
    data %>% 
      group_by(...) %>% 
      summarise(total = n())
  } else {
    data %>% 
      group_by(...) %>% 
      summarise(total = sum(!!sumVar))
  }
  
}

df %>% grpsum(v)
# A tibble: 1 x 1
  total
  <dbl>
1 4994.

df %>% grpsum()
# A tibble: 1 x 1
  total
  <int>
1   100

I'm sure you could shorten the code by using TidyEval on the total = ... argument, but I haven't figured that out yet

Hope this helps,
PJ

Thank you for your idea!

But as soon, as you are only supplying a grouping variable, the solution fails.

> df %>% grpsum(g1)
# Error in sum(~g1) : invalid 'type' (character) of argument

When debugging the function I realized that the function assumed sumVar = g1.

Hi,

I see what you were getting at. I just switched the postion of the ... and it should work now. Remember, like you suggested, the sumVar should be explicitly assigned if you want to use it, if not, we just count

library(tidyverse)
library(rlang)
set.seed(1234)

# base data set
df <- tibble(
  g1 = c("A", "B")[round(runif(100, 1, 2))],
  g2 = c("X", "Y")[round(runif(100, 1, 2))],
  v = runif(100, 0, 100)
)

# my function
grpsum <- function(data, ..., sumVar) {
  
  sumVar = enquo(sumVar)

  if(quo_is_missing(sumVar)){
    data %>% 
      group_by(...) %>% 
      summarise(total = n())
  } else {
    data %>% 
      group_by(...) %>% 
      summarise(total = sum(!!sumVar))
  }
  
}
#Empty means just count
> df %>% grpsum()
# A tibble: 1 x 1
  total
  <int>
1   100

#One variable will be used as grouping
> df %>% grpsum(g1)
# A tibble: 2 x 2
  g1    total
  <chr> <int>
1 A        55
2 B        45

sumVar explicitly stated, with one grouping variable 
> df %>% grpsum(sumVar = v, g1)
# A tibble: 2 x 2
  g1    total
  <chr> <dbl>
1 A     2598.
2 B     2413.

#Multiple grouping variables (sumVar not explicitly stated and thus we count)
> df %>% grpsum(v, g1)
# A tibble: 100 x 3
# Groups:   v [100]
        v g1    total
    <dbl> <chr> <int>
 1  0.328 B         1
 2  1.60  B         1
 3  2.57  A         1
 4  2.96  A         1
 5  3.14  A         1
 6  4.95  A         1
 7  6.51  A         1
 8 10.1   B         1
 9 10.7   A         1
10 11.5   A         1
# ... with 90 more rows

How does this look?

PJ

Ah, that works! Thank you....

I did not know quo_is_missing() and wasn't aware of having a named parameter behind ... (thought it was bad programming style, but if it works.... and str_c is defined like this aswell)

Hi,

Glad I could help.

I think there might be a more elegant solution, but TidyEval is so complex it's very hard to grasp the logic of it I find. I only discovered the quo_is_missing by searching for similar problems online and then had to play with until I found the correct implementation.

Don't forget to mark the post as the solution if that's OK.

PJ

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.