How to use arguments as character in a function?

jinglin0318 · March 30, 2020, 4:31pm

Give a minimum example.

df <- data.frame("Treatment" = c(rep("A", 2), rep("B", 2)), "Price" = 1:4, "Cost" = 2:5)

I want to summarize the data by treatments for all the variables I have, and put them together, so I define a function to do this for each variable first, and then rbind them later on.

SummarizeFn <- function(x,y) {
                       x %>% group_by(Treatment) %>% 
                       summarize(
                            n = n(),
                            Mean = mean(y), 
                            SD = sd(y)
                       ) %>% cbind ("Var" = rep(y, 3)) # add a column to show which variable those statistics belong to. 
                   }
SumPrice <- SummarizeFn(df, Price)

However, I have two problems here.

First, R tells me that Error in mean(y): can't find object 'Price'.

Second, in the last part, cbind(), how can I pass the name of the variable, Price for example, as a character string in that column? Double quote is not going to work for sure, I also tried paste function which does not work either. How to do this?

technocrat · March 30, 2020, 8:10pm

Hi, and welcome!

Please see the FAQ: What's a reproducible example (`reprex`) and how do I do one? Using a reprex, complete with representative data will attract quicker and more answers. You've done a good job here, just missing a bit.

The most pressing issue is the y argument to SummarizeFn, Price. It's not in the namespace because it's embedded in df.

The simplest fix to that part of the problem is using df$price as the y argument

suppressPackageStartupMessages(library(dplyr))
df <- data.frame("Treatment" = c(rep("A", 2), rep("B", 2)), "Price" = 1:4, "Cost" = 2:5)
SummarizeFn <- function(x,y) {
  x %>% group_by(Treatment) %>% 
    summarize(
      n = n(),
      Mean = mean(y), 
      SD = sd(y)
    ) %>% cbind ("Var" = rep(y, 3)) # add a column to show which variable those statistics belong to. 
}
SumPrice <- SummarizeFn(df, df$Price)
SumPrice
#>    Treatment n Mean       SD Var
#> 1          A 2  2.5 1.290994   1
#> 2          B 2  2.5 1.290994   2
#> 3          A 2  2.5 1.290994   3
#> 4          B 2  2.5 1.290994   4
#> 5          A 2  2.5 1.290994   1
#> 6          B 2  2.5 1.290994   2
#> 7          A 2  2.5 1.290994   3
#> 8          B 2  2.5 1.290994   4
#> 9          A 2  2.5 1.290994   1
#> 10         B 2  2.5 1.290994   2
#> 11         A 2  2.5 1.290994   3
#> 12         B 2  2.5 1.290994   4

^{Created on 2020-03-30 by the reprex package (v0.3.0)}

robbiebatley · March 31, 2020, 8:53am

It may be easier to transpose your data first and them summarise?

library(tidyr)
library(dplyr)

df <- data.frame("Treatment" = c(rep("A", 2), rep("B", 2)), 
                 "Price" = 1:4, 
                 "Cost" = 2:5)

df %>% 
    # Could use tidyr::gather instead
    pivot_longer(-Treatment, names_to = "Var", values_to = "value") %>%
    group_by(Treatment, Var) %>% 
    summarize(n = n(), Mean = mean(value), SD = sd(value))
#> # A tibble: 4 x 5
#> # Groups:   Treatment [2]
#>   Treatment Var       n  Mean    SD
#>   <fct>     <chr> <int> <dbl> <dbl>
#> 1 A         Cost      2   2.5 0.707
#> 2 A         Price     2   1.5 0.707
#> 3 B         Cost      2   4.5 0.707
#> 4 B         Price     2   3.5 0.707

To do it the way you were intending you could use the enquo and !! functions to fix your namespace problems. Have a look at:

vignette("programming", package = "dplyr")

^{Created on 2020-03-31 by the reprex package (v0.3.0)}

system · April 21, 2020, 8:53am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.