I am looking for a way to have a function which does grouping with multiple variables and additionally one variable to have something like wt in count...
So far my solution is:
library(tidyverse)
set.seed(1234)
# base data set
df <- tibble(
g1 = c("A", "B")[round(runif(100, 1, 2))],
g2 = c("X", "Y")[round(runif(100, 1, 2))],
v = runif(100, 0, 100)
)
# my function
grpsum <- function(data, sumVar, ...) {
data %>%
group_by(...) %>%
summarise(total = sum({{sumVar}}))
}
df %>% grpsum(v, g1, g2)
#> # A tibble: 4 x 3
#> # Groups: g1 [2]
#> g1 g2 total
#> <chr> <chr> <dbl>
#> 1 A X 1176.
#> 2 A Y 1422.
#> 3 B X 1254.
#> 4 B Y 1159.
df %>% grpsum(v, g1)
#> # A tibble: 2 x 2
#> g1 total
#> <chr> <dbl>
#> 1 A 2598.
#> 2 B 2413.
df %>% grpsum(v)
#> # A tibble: 1 x 1
#> total
#> <dbl>
#> 1 5012.
To improve this function I would like to make sumVar optional. If sumVar is not preset count values, here total would be 100.
I would also prefer to have sumVar as an explicit parameter and not somewhere hidden in the ....
Alternatively, is there another way pass multiple grouping variables without using ...?
Thank you for any ideas!
(I hope I haven't missed a topic where this is already answered - if so: sorry for that!)
I see what you were getting at. I just switched the postion of the ... and it should work now. Remember, like you suggested, the sumVar should be explicitly assigned if you want to use it, if not, we just count
library(tidyverse)
library(rlang)
set.seed(1234)
# base data set
df <- tibble(
g1 = c("A", "B")[round(runif(100, 1, 2))],
g2 = c("X", "Y")[round(runif(100, 1, 2))],
v = runif(100, 0, 100)
)
# my function
grpsum <- function(data, ..., sumVar) {
sumVar = enquo(sumVar)
if(quo_is_missing(sumVar)){
data %>%
group_by(...) %>%
summarise(total = n())
} else {
data %>%
group_by(...) %>%
summarise(total = sum(!!sumVar))
}
}
#Empty means just count
> df %>% grpsum()
# A tibble: 1 x 1
total
<int>
1 100
#One variable will be used as grouping
> df %>% grpsum(g1)
# A tibble: 2 x 2
g1 total
<chr> <int>
1 A 55
2 B 45
sumVar explicitly stated, with one grouping variable
> df %>% grpsum(sumVar = v, g1)
# A tibble: 2 x 2
g1 total
<chr> <dbl>
1 A 2598.
2 B 2413.
#Multiple grouping variables (sumVar not explicitly stated and thus we count)
> df %>% grpsum(v, g1)
# A tibble: 100 x 3
# Groups: v [100]
v g1 total
<dbl> <chr> <int>
1 0.328 B 1
2 1.60 B 1
3 2.57 A 1
4 2.96 A 1
5 3.14 A 1
6 4.95 A 1
7 6.51 A 1
8 10.1 B 1
9 10.7 A 1
10 11.5 A 1
# ... with 90 more rows
I did not know quo_is_missing() and wasn't aware of having a named parameter behind ... (thought it was bad programming style, but if it works.... and str_c is defined like this aswell)
I think there might be a more elegant solution, but TidyEval is so complex it's very hard to grasp the logic of it I find. I only discovered the quo_is_missing by searching for similar problems online and then had to play with until I found the correct implementation.
Don't forget to mark the post as the solution if that's OK.