How do I add checks to error proof my functions in R

I'm trying to learn how to write functions to check certain conditions while executing. I've created a function that takes some data and computes a stat (mean, median etc) for a grouped variable. See below -

library(tidyverse)

my_function <- function(data, var, stat){
    
    output <- data %>% 
        as_tibble() %>% 
        group_by({{var}}) %>% 
        summarise(stat = stat(mpg)) %>% 
        ungroup()
    
    return(output)
        
}

my_function(
    data = mtcars,
    var = cyl,
    stat = median
)

Can someone please show me how to perform the following checks -

  1. Check if data is supplied in the function, if not, stop execution and a message saying "data required"

  2. Check if var is supplied in the function, if not, use cyl and show a message saying something like "var not supplied, cyl used"

  3. Check if stat is supplied in the function, if not, use mean and show a message saying something like "stat not supplied, mean used"

For your questions 2 and 3, there is a slightly easier solution: in R, you can give arguments with default values, that will be used if the user doesn't supply other arguments themselves:

my_function1 <- function(data, var=cyl, stat=mean){
  data %>% 
    as_tibble() %>% 
    group_by({{var}}) %>% 
    summarise(stat = stat(mpg)) %>% 
    ungroup()
}
my_function1(
  data = mtcars,
  var = cyl,
  stat = median
)

You can find explanations of default values here. Since they are a very common feature of R functions, I don't think you need to warn the user (but see below if you want).

I also made another change: in R, the last value is automatically returned, so it is more common not to use a return() statement. There is nothing wrong with it if you prefer to always use output as intermediary variable, it's just not useful.

Stop for empty data

As explained here, you can use if to check a condition, and, if that condition is fulfilled, give an error message etc.

Now, the condition you want to check is whether the user supplied the function arguments. The function missing() is meant to check that. That way, missing(data) returns TRUE if data has not been provided. So this function should do what you want:

my_function2 <- function(data, var=cyl, stat=mean){
  if(missing(data)){
    stop("data required")
  }
  
  data %>% 
    as_tibble() %>% 
    group_by({{var}}) %>% 
    summarise(stat = stat(mpg)) %>% 
    ungroup()
}
my_function2(
  data = mtcars,
  var = cyl,
  stat = median
)

Note the existence of the shorthand stopifnot() which can be useful.

Warn when var and stat missing

Now we are going into something harder. For your questions 2 and 3, how to proceed if you really don't want to use default values?

Rather than stop() use can use other condiditons, pretty well explained here. Briefly, you can use message("var not supplied") or warning("var not supplied") to inform the user of something, whereas stop() is to say there is an error and the function can't continue. So your function might look like this:

my_function3 <- function(data, var, stat){
  if(missing(data)){
    stop("data required")
  }
  
  if(missing(var)){
    message("var missing, cyl used instead")
  }
  
  if(missing(stat)){
    message("stat missing, mean used instead")
  }
  
  data %>% 
    as_tibble() %>% 
    group_by({{var}}) %>% 
    summarise(stat = stat(mpg)) %>% 
    ungroup()
}

Now, the difficulty is to actually perform the replacement. It would be easy if you were only using classic R and providing the arguments as strings, here it is harder because you are using tidy evaluation. For the function stat, it works as expected:

my_function4 <- function(data, var, stat){
  if(missing(data)){
    stop("data is missing")
  }
  
  if(missing(var)){
    message("var missing, cyl used instead")
  }
  
  if(missing(stat)){
    message("stat missing, mean used instead")
    stat <- mean
  }
  
  
  output <- data %>% 
    as_tibble() %>% 
    group_by({{var}}) %>% 
    summarise(stat = stat(mpg)) %>% 
    ungroup()
  
  return(output)
}

my_function4(
  data = mtcars,
  var = cyl
)

(make sure you give the name of the function mean without parentheses, if you use mean() you are actually calling that function with no argument)

But for var it is harder, since you are using quasiquotation. The theory is very painful to understand, the short version is that this should work (I think):

my_function5 <- function(data, var, stat){
  if(missing(data)){
    stop("data required")
  }
  
  if(missing(var)){
    message("var missing, cyl used instead")
    var <- expr(cyl)
  } else{
    var <- enquo(var)
  }
  
  
  if(missing(stat)){
    message("stat missing, mean used instead")
    stat <- mean
  }
  
  
  data %>% 
    as_tibble() %>% 
    group_by(!!var) %>% 
    summarise(stat = stat(mpg)) %>% 
    ungroup()
}
my_function5(
  data = mtcars,
  stat = mean
)

Short version, calling {{var}} is actually equivalent to calling !!enquo(var), meaning you capture the content of var in a quosure, and you evaluate it. So to alter the content of var, we need to provide an expression rather than a value. This is pretty advanced R/tidyverse, if you don't have a lot of experience with R it's probably better not to try too hard. Note that, if you are used to other programming languages such as C, you might find the standard R approach (no tidy evaluation) more intuitive: just provide variable names as strings.

my_function6 <- function(data, var, stat){
  if(missing(data)){
    stop("data required")
  }
  
  if(missing(var)){
    message("var missing, cyl used instead")
    var <- "cyl"
  }
  
  if(missing(stat)){
    message("stat missing, mean used instead")
    stat <- "mean"
  }
  
  stat <- match.fun(stat)
  
  data %>% 
    as_tibble() %>% 
    group_by(.data[[var]]) %>% 
    summarise(stat = stat(mpg)) %>% 
    ungroup()
}
my_function6(data = mtcars,
             var = "cyl",
             stat = "median")

where match.fun() is used to find a function when given its name as a string, and the .data pronoun is a way to mix tidyverse functions with more classic base R.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.