Implementation best practices for good error messages

I'm trying to program defensively in my package development, using a lot input validation.
In particular, I'm relying on a lot of the ready-made assertions in checkmate, testthat, assertr, assertthat and the like, which makes life a lot easier (and code shorter).

Hadley Wickhams's tidyverse style guide for error messages suggests that error messages should point users to the exact source of the problem, like so:

#> Error: Can't find column 'b' in '.data'

(Columns are just an example, sometimes it might be rows, or some other index).

I'm now wondering how this can be implemented elegantly and consistently in a package, given that a lot of the existing assertions (from above packages, but also base r) don't give you any indices back in their errors.

Here's an example:

m <- matrix(data = c(0, 1, 5, -2), nrow = 2)

# arbitrary assertion
assert_positive <- function(x) {
  if (any(x < 0)) {
    stop(call. = FALSE,
            "All numbers must be non-negative")
  } else {
    return(invisible(x))
  }
}
# (there are *lots* of these in packages such as checkmate, testthat or assertr that should be reused)

assert_positive(m)

gives:

## Error: All numbers must be non-negative

So far so good, but this does not give the desired indices of the errors.

Yes, I know that I could just change the above assert_positive() function to do that, but I would like to reuse a lot of the functions in checkmate and friends, so I can't touch them, and there's too many of them anyway.

So I should probably wrap something around these existing tests, such as a simple for loop:

# via for-loops
assert_positive2 <- function(x) {
  for (r in 1:nrow(x)) {
    res <- try(expr = assert_positive(x[r, ]), silent = TRUE)
    if (inherits(x = res, what = "try-error")) {
      stop(
        call. = FALSE,
        paste0(
          "in row ",
           r,
           ": ",
           attr(x = res, which = "condition")$message,
           "."
        )
      )
    }
  }
}
    
assert_positive2(m)

gives:

## Error: in row 2: All numbers must be non-negative.

That gets the job done, but it's a lot of clutter and the code is not very expressive.
I've also thought about Reduce() with try(), but that won't give indices, and neither would any apply() action.
I guess, finally, a closure or function factory would be helpful to generalise this to many assertions.

This just feels like a problem that many other people (crafting better error messages) must have already run into, so:

What's an elegant/canonical way to do this?

I'm interested in a broader discussion of best practices in pkg development behind the scenes to get error messages of the quality demanded by the tidyverse style guide.

Ps.: This is crossposted from StackOverflow, where this obviously wasn't a good fit because it's too broad a question – hope it'll fit better here.

1 Like

I don't know that this is as elegant as you would like, and it certainly feels like a bit of duplication, but I have found it to be the most useful when dealing with this situation.

I use checkmate's assert collections (coll), which allow me to report the findings from multiple assertions together. I'll often do the initial assertion, then if I want to indicate anything about specific elements, I do an additional manual check, adding my own message with coll$push.

column_name_example <- function(data, cols){
  coll <- checkmate::makeAssertCollection()
  
  checkmate::assert_data_frame(data,
                               add = coll)
  
  checkmate::assert_subset(x = cols,
                           choices = names(data),
                           add = coll)
  
  if (any(!cols %in% names(data))){
    coll$push(paste0("The following columns are not found in `data`: ",
                     paste0(cols[!cols %in% names(data)], collapse = ", ")))
  }
  
  checkmate::reportAssertions(coll)
  
  data[cols]
}

column_name_example(mtcars, c("am", "not_here"))

# Error in column_name_example(mtcars, c("am", "not_here")) : 
#  2 assertions failed:
# * Variable 'cols': Must be a subset of
# * {'mpg','cyl','disp','hp','drat','wt','qsec','vs','am','gear','carb'}.
# * The following columns are not found in `data`: not_here 

If you aren't a fan of the code duplication, you can use test_* in an if statement, and then add your own message. Such as below:

column_name_example <- function(data, cols){
  coll <- checkmate::makeAssertCollection()
  
  checkmate::assert_data_frame(data,
                               add = coll)
  
  if (!checkmate::test_subset(x = cols,
                              choices = names(data))){
    coll$push(paste0("The following columns are not found in `data`: ",
                     paste0(cols[!cols %in% names(data)], collapse = ", ")))
  }
  
  checkmate::reportAssertions(coll)
  
  data[cols]
}

column_name_example(mtcars, c("am", "not_here"))

# Error in column_name_example(mtcars, c("am", "not_here")) : 
#  1 assertions failed:
# * The following columns are not found in `data`: not_here 
1 Like