Customize dplyr summarise function

I would like to customize summary stats function. Result is NA with warnings.

my_summarise <- function(df, a) {
  df %>%
    summarise(Mean = mean("a"), 
              SD = sd("a"), 
              `CV%` = (sd("a") / mean("a")) * 100)
}

iris %>% 
  group_by(Species) %>% 
  my_summarise(a = Sepal.Length)

# A tibble: 3 x 4
  Species     Mean    SD `CV%`
  <fct>      <dbl> <dbl> <dbl>
1 setosa        NA    NA    NA
2 versicolor    NA    NA    NA
3 virginica     NA    NA    NA
There were 12 warnings (use warnings() to see them)

Welcome to the community!

Try something like this:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

custom_summarise <- function(df, a)
{
    df %>%
        summarise(Mean = mean(x = {{a}}), 
                  SD = sd(x = {{a}}), 
                  `CV%` = (SD / Mean * 100))
}

iris %>% 
    group_by(Species) %>% 
    custom_summarise(a = Sepal.Length)
#> # A tibble: 3 x 4
#>   Species     Mean    SD `CV%`
#>   <fct>      <dbl> <dbl> <dbl>
#> 1 setosa      5.01 0.352  7.04
#> 2 versicolor  5.94 0.516  8.70
#> 3 virginica   6.59 0.636  9.65

Created on 2020-04-19 by the reprex package (v0.3.0)

You can take a look at this.

Hope this helps.


PS: To be frank, I always get confused (a lot) with {{, !!, !!! etc. Maybe someone more familiar with these may take a look at this thread and provide a link to a detailed documentation for both of us.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

I try to understand what I read on that page you provided a link to, but still confused.
For example:

var <- sym("height")

starwars %>%
  summarise(avg = mean(!!var, na.rm = TRUE))

gives:
174.

But when I do this:

starwars %>%
  summarise(avg = mean(height, na.rm = TRUE))

It works as well giving 174 as a result.
So what's the point to do:

var <- sym("height")

I am looking for some examples that shows it in a simple way, meaning when to use !!, !!!, {{ }}, {{{ }}}, (...), var, vars, etc.

The first few paragraphs on this page explain the differences between {{, !! and !!!. Hope it helps.

Thank you very much,
There is a lot of definitions to grasp:
UQ and UQS, sym and ensym, quo and enquo, exp and enexpr etc.

And all of it serves one purpose: to write columns' names without quotes. Am I right ?

@Andrzej The point is that you need these operators when you want to use dplyr verbs inside functions.

library(dplyr, warn.conflicts = FALSE)

# This won't work.
foo <- function(col_name) {
  starwars %>% 
    summarize(avg = mean(col_name, na.rm = TRUE))
}

foo(height)
#> Error in mean(col_name, na.rm = TRUE): object 'height' not found


# But this will.
foo <- function(col_name) {
  
  col_name <- ensym(col_name)
  
  starwars %>% 
    summarize(avg = mean(!!col_name, na.rm = TRUE))
}

foo(height)
#> # A tibble: 1 x 1
#>     avg
#>   <dbl>
#> 1  174.

Created on 2020-04-19 by the reprex package (v0.3.0)

1 Like

No. I'll suggest you read the book on tidy evaluation to gain a better understanding of this subject.