referring to tidy df cols in a function

Hi!

How can I refer to a specific variable within a function? The goal is to write a function that will easily allow me to summarize a large dataset by various variables for different analyses.

The problem is figuring out how to translate the variable defined for the function to a reference to an actual dataframe in my environment.

Here is an example of the syntax I've been using

#ex data
place  <- c("house", "car", "park", "house", "car")
year <- c("2010", "2010", "2010", "2011", "2011")
var1 <- c(20, 10, 100, 50, 200)

df <- data.frame(place, year, var1)


#what I want to do 
function(myvar){
df %>% split(place) %>% 
map(mutate, "change"= myvar - lag(myvar)) %>% return()
}

but how do I get myvar to refer to df$var1 as opposed to just "var1"?

Thanks!

You can use the "curly-curly" ({{ }}) operator to pass column names into a function. For example:

library(tidyverse)

fnc = function(data, splitvar, myvar) {
  data %>% 
  split({{splitvar}}) %>% 
  map(mutate, change = {{myvar}} - lag({{myvar}})) %>% 
  return()
}

df %>% fnc(place, var1)
$car
  place year var1 change
2   car 2010   10     NA
5   car 2011  200    190

$house
  place year var1 change
1 house 2010   20     NA
4 house 2011   50     30

$park
  place year var1 change
3  park 2010  100     NA

In your example, it's not necessary split the data frame. You can use group_by instead:

fnc2 = function(data, groupvar, myvar) {
  data %>% 
    group_by({{groupvar}}) %>% 
    mutate(change = {{myvar}} - lag({{myvar}})) %>% 
    arrange({{groupvar}}) %>% 
    return()
}

df %>% fnc2(place, var1)
  place year   var1 change
  <chr> <chr> <dbl>  <dbl>
1 car   2010     10     NA
2 car   2011    200    190
3 house 2010     20     NA
4 house 2011     50     30
5 park  2010    100     NA
1 Like

This is great! But I'm still having trouble having it recognize my variable

#here is a subset of my actual data 
 
GEOID <-  c(1001, 1001, 1001, 1001, 1001)
year <- c(2010, 2011, 2012, 2013, 2014)
race <- c("White", "non-white", "White", "non-white", "White")
employment <- c(19812, 4529, 19853, 4286, 19689)
`emp rate` <- c(92.83, 85.58, 92.10, 81.79, 91.18)

emp_dt <- data.frame(GEOID, year, race, employment, `emp rate`)

func <- function(data=emp_dt, groupvar1=GEOID, groupvar2=race, myvar){

tmp_dt_wt <- data %>% group_by({{groupvar1}}, {{groupvar2}}) %>%
  arrange(year) %>% 
  mutate("change" = {{myvar}} - lag({{myvar}}))

}
func(`emp rate`)

# Error in group_by(., { : object 'emp rate' not found 

#I tried with a variable without space in the name too

func(employment)
#Error in group_by(., { : object 'employment' not found 

any ideas?

Check the errors in the code creating each vector of values, then make sure that emp_dt has all the variables you expect it to have. Does the code run once those errors are fixed?

as per joels suggestion, check the names of your columns

names(emp_dt)

consider that functions need to be told what to return.
so either dont assign your function internals to the tmp_dt_wt name and let them be returned directly, or do assign them to that name, but then place that name on the final line of the function body so that that is what is returned.
finally, when you call your function, also consider the function parameter orders, do you intend the employment rate variable to be used as myvar ? because that wont happen undless you specify myvar= , or its the 4th parameter you pass.



f <- function(a=1,b=2,c=3,d){
  return(list(a,b,c,d))
}

f(9)

f(a=1,
  b=2,
  c=3,
  9)

better_f <- function(d,a=1,b=2,c=3){
  return(list(a,b,c,d))}

better_f(9)
1 Like

Got it, part of the problem was that the variable name emp rate was getting changed to emp.rate within the function even though the spaces remained in the dataframe outside of the function.

Thanks!!

Thank you, my putting the variable to be defined at the end of the args instead of the front of the function was part of the problem!