how to turn strings from function arguments into column names with dplyr

dplyr

#1

Suppose I have a dataframe my_df and want to pass a column called column_name1 into a function and use that column inside the function with mutate().

My goal is to compare the value in "passed_column" to the value of threshold (sample code below). My code creates the new column but doesn't evaluate the comparison in the ifelse() correctly. I'd like to use the lazyeval package but am open to any solution.

calculate_foo <- function(df, passed_column, threshold){
    new_df <- df %>% mutate(new_column = ifelse(!!quo_name(passed_column) >= threshold, 1, 0)) 
    return(new_df)
}

set.seed(2018)
my_df <- data.frame(column_name1 = sample(0:50, 50, replace = T), column_name2 = sample(0:25, 50, replace = T))
threshold_number <- 21

calculate_foo(my_df, "column_name1", threshold_number)

#2

With few small tweaks your example works just fine:

suppressPackageStartupMessages(library(tidyverse))
calculate_foo <- function(df, passed_column, threshold){
  column_sym <- rlang::sym(passed_column)
  df %>% 
    mutate(new_column = dplyr::if_else(!!column_sym >= threshold, 1, 0)) 
}

set.seed(2018)
my_df <- tibble::tibble(column_name1 = sample(0:50, 50, replace = T), column_name2 = sample(0:25, 50, replace = T))
threshold_number <- 21

calculate_foo(my_df, "column_name1", threshold_number)
#> # A tibble: 50 x 3
#>    column_name1 column_name2 new_column
#>           <int>        <int>      <dbl>
#>  1           17           10          0
#>  2           23            5          1
#>  3            3            3          0
#>  4           10           19          0
#>  5           24           17          1
#>  6           15           25          0
#>  7           30           10          1
#>  8            6           15          0
#>  9           48            6          1
#> 10           27           12          1
#> # ... with 40 more rows

Created on 2018-12-07 by the reprex package (v0.2.1)


#3

That works great. Thank you!

A couple questions:

  1. Is there an advantage to using a tibble in this context? I notice dataframe gives the same output.
  2. On a conceptual level, why is if_else() needed within the mutate()?

Thanks again.


#4
  1. tibble is a visual enhancement of a data.frame. As you can see, it does, for example, only print first 10 rows not to overwhelm your console. It also prints types of columns, respects width of your console etc. But as I've said, it doesn't change the object itself, just prints it nicely.
  2. On a very high level, there is no difference, but there are situations where output of ifelse can be surprising - https://vctrs.r-lib.org/articles/stability.html#ifelse. That is the reason for even having dplyr::if_else in the first place.