How to pass a numerical variable to summarize in a function?

Hey guys,

I'm new to R. Recently I came across a problem and I can't figure it out. I wonder how I can pass a numeric variable to summarize in a function? More specific, I want to group by "Provider" and "Network" and summarize "AWP" and "Claim".

I write a simple function to illustrate my problem. In this case, I got an error when I run the function
cal_var("AWP") or cal_var("Claim").

my error is: "Error in sum(var) : invalid 'type' (character) of argument ". Thank you so much for your help!

example<-data.frame("Provider" = c("a", "b", "c", "c", "b", "a"), "Network" = c("50k", "45k", "40k", "40k", "45k", "50k"),
"AWP" = c(500, 1000, 1500, 2000, 2500, 3000), "Claim" = c(100, 150, 200, 250, 300, 350), stringsAsFactors = FALSE)

cal_var<-function(var){
example %>%
group_by(Provider, Network) %>% summarize(total_var= sum(var))
}

cal_awp("AWP")
cal_awp("Claim")

This requires handling Non Standard Evaluation which, honestly, makes my head hurt. Here is one solution which results in calling the function without quotes around the column name.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(rlang)

example<-data.frame("Provider" = c("a", "b", "c", "c", "b", "a"), "Network" = c("50k", "45k", "40k", "40k", "45k", "50k"),
                    "AWP" = c(500, 1000, 1500, 2000, 2500, 3000), "Claim" = c(100, 150, 200, 250, 300, 350), stringsAsFactors = FALSE)

cal_var<-function(var){
  EnqVar <- enquo(var)
  example %>%
    group_by(Provider, Network) %>% summarize(total_var= sum(!!EnqVar))
}

cal_var(AWP)
#> # A tibble: 3 x 3
#> # Groups:   Provider [3]
#>   Provider Network total_var
#>   <chr>    <chr>       <dbl>
#> 1 a        50k          3500
#> 2 b        45k          3500
#> 3 c        40k          3500
cal_var(Claim)
#> # A tibble: 3 x 3
#> # Groups:   Provider [3]
#>   Provider Network total_var
#>   <chr>    <chr>       <dbl>
#> 1 a        50k           450
#> 2 b        45k           450
#> 3 c        40k           450

Created on 2020-03-16 by the reprex package (v0.3.0)

I found the answer on stack overflow. By using rlang package, it solved the problem perfectly: summarize(total_var = sum(!!rlang::sym(var))

https://stackoverflow.com/questions/51859470/dplyr-summarise-wont-work-when-i-get-column-name-from-list

Thank you so much! Really appreciate it! I found the solution a minute ago by using rlang as well!

Just to add on, the modern way of doing this is by using the curly-curly or embrace operator {{.

library(dplyr, warn.conflicts = FALSE)

example <- data.frame("Provider" = c("a", "b", "c", "c", "b", "a"),
                      "Network" = c("50k", "45k", "40k", "40k", "45k", "50k"),
                      "AWP" = c(500, 1000, 1500, 2000, 2500, 3000),
                      "Claim" = c(100, 150, 200, 250, 300, 350),
                      stringsAsFactors = FALSE)

cal_var <- function(var) {
  example %>%
    group_by(Provider, Network) %>% 
    summarize(total_var = sum({{ var }}))
    }

cal_var(AWP)
#> # A tibble: 3 x 3
#> # Groups:   Provider [3]
#>   Provider Network total_var
#>   <chr>    <chr>       <dbl>
#> 1 a        50k          3500
#> 2 b        45k          3500
#> 3 c        40k          3500
cal_var(Claim)
#> # A tibble: 3 x 3
#> # Groups:   Provider [3]
#>   Provider Network total_var
#>   <chr>    <chr>       <dbl>
#> 1 a        50k           450
#> 2 b        45k           450
#> 3 c        40k           450

Created on 2020-03-17 by the reprex package (v0.3.0)

Thank you so much! really appreciate it!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.