Resolving duplicating tests

Hello there!

I started this topic to gather opinions about the following issue.

It is considered a good practice (for readability and maintainability of package) to decompose big functions into application of smaller ones. For example:

do_action <- function(x, y) {
  z <- do_first(x)

  do_second(y, z)
}

It is common sense to test functions do_first and do_second separately from do_action. Those tests define desired behaviour for corresponding functions. However, the initial goal is define and test behaviour of do_action, which should lead to duplicating tests. As any sort of duplication is to be avoided in programming, my question is what is a good practice of resolving this test duplicating issue?

In testthat I found function describe which is used for "documenting ... intended behaviour". It can partially answer the question: use describe for big functions "to verify that you implement the right things" and test_that for smaller ones "to ensure you do the things right". However, I don't see much usage of this approach.

Personally I feel that defining new expectation expect_calls(expr, fun_names) might be a good solution. It will check that functions with names fun_names are called during evaluation of expression expr. So testing do_action will now be something like this:

expect_calls(
  quote(do_action(1, 2)),
  c("do_first", "do_second")
)

Any thoughts about this issue are welcome.

The problem with that approach is that it tests implementation, not what the part's meant to do. The point of testing is to make sure things still do what they should even if you change something. Testing implementation discourages changes, since you'd then have to rewrite all those tests (for no good reason). Professors don't grade papers based on how you used a word processor.

It's hard to suggest alternatives without examples, but duplicative tests aren't always bad. If do_action and do_first have similar results, it shouldn't be because they're intrinsically linked. So the tests should be separate.

Again, it's hard to discuss this without concrete examples.

Sure, I understand that testing implementation is bad. I just think that overly duplicated code (even in tests) is more evil :slight_smile:

Ok, let's talk with example:

# Preparation functions. Might be many classes
prepare <- function(x) {UseMethod("prepare")}
prepare.default <- function(x) {x}
prepare.class1 <- function(x) {sqrt(-x)}
prepare.class2 <- function(x) {log(x + 1)}

# Actual check function
perform_check <- function(x, y) {x > y}

# Exported wrappers actually to be tested
check_raw <- function(x, y) {perform_check(x, y)}
check_first <- function(x, y) {perform_check(prepare(x), y)}
check_second <- function(x, y) {perform_check(x, prepare(y))}
check_both <- function(x, y) {perform_check(prepare(x), prepare(y))}

Ideas that this code is meant to express:

  • There are many exported similar functions. All of them do some preparation (specific to wrapper) and apply some common function (perform_check in this case).
  • Functions prepare (with methods) and perform_check are fairly long and complicated, which justifies decomposing. Also methods of prepare are quite different.
  • Common function perform_check has many edge cases of interest which should be tested. In this example it might be tests for x and y being equal to NA, NaN, Inf, -Inf. These edge cases transfer to exported wrappers.

So the issue is how to handle testing of exported wrappers? I can agree that suggested approach with expect_calls doesn't solve the situation entirely. However, it might be a better solution than testing all exported wrappers for all edge cases for all arguments' classes .

Code duplication is only a problem when it can cause a problem. For example, each of your check_* functions has steps that should always do the same thing with the same inputs. You factored those steps out as the perform_check function. That avoids dangerous code duplication, where one function could've ended up following different logic than the others.

But "duplicated" tests aren't often problems. Tests say "foo should do X". If bar does X also, then it will have the same test. But it shouldn't be tied to foo, because they're independent parts.

Testing implementation can defeat the purpose of tests, which is to reassure you things still work as intended when changed. If you have to rewrite the tests every time something changes, they're a lot less reassuring.

If it really bothers you, make your own tests with testthat::expect(...). Then the duplicated code is only a single line per test.

The thing is, I really don't see an efficient way of writing one expectation for all check_ functions. On their level they have different edge cases, i.e. conditions at which arguments for perform_check will be NA, NaN, etc. So my current understanding is that I need to write separate set of expectations for each of them.

With that in mind, I think, my concern about duplicating tests is about the following situation. What if I want to handle NA in perform_check differently? As consequence, I also need to update tests for this function, which is fair. However, I also need to update tests for every check_ function. As you mentioned, it undermines the purpose of tests. Instead, if there is a way to "say" in tests something like "Function perform_check is responsible for behaviour in edge cases", no updating of check_ tests in needed.

If it matters what the check_ functions do when NA is in the input, I think the tests should reflect that. If the same thing matters for perform_check, then it should also have a test reflecting that.

But those should be different decisions. The purpose of perform_check should not be "bundle up the 100 lines of code shared by each check_ function." It should do one task: implement a standardized check. The check_ functions should have their own purpose. And, especially since they're exported, the check_ tests shouldn't often change (the price of maintaining an interface).

This next point is going beyond the idea of duplicated tests, but here's a suggestion for your example situation. If the check_ functions act as logical branches of perform_check, this would be better represented by moving everything into perform_check, giving it a toggle parameter, and exporting it instead.

#' @param x       Something
#' @param y       Something
#' @param prepare Character vector naming which arguments ("x" and/or "y") to
#'   prepare. If NULL (default), neither argument is prepared.
#' @export
perform_check <- function(x, y, prepare = NULL) {
  if ("x" %in% prepare) {
    x <- prepare(x)
  }
  if ("y" %in% prepare) {
    y <- prepare(y)
  }
  
  [code from the original perform_check...]
}

If two things have extremely similar purposes, you can often combine them. That would really unduplicate code.

Well, for me your most convincing argument is:

And, especially since they’re exported, the check_ tests shouldn’t often change (the price of maintaining an interface)

If check_ tests are orginized in the way I previously described then outside interface will break unnoticeably. And that is really bad. Thank you for your ideas.

Maybe there are other opinions and ideas out there?