Approach to testing (independent validation of algorithm implementation)

Hi,

We are using R to support both APIs and web apps (via Shiny) as commercial applications. The primary components of these are a few functions applying and analysing monte carlo simulations (we're projecting the uncertainty around pension pots).

A question has arisen over the approach we should take for testing the implementation of the algorithms in these functions (mainly because R is a new language for us to use in production but it gives an opportunity to challenge existing conventions).

Specifically, the question relates to how we give comfort that there has been an appropriate independent validation of the implementation of the algorithm for use in production.

Broadly, the two competing views are:

  1. The developer implements the algorithm and adds unit tests (using testthat) for edge cases (because the algorithm is typically complex, it's not possible to come up with non-edge case outputs without re-implementing the algorithm). An independent developer then reviews the implementation and test coverage by reading through the code. The main implementation is used to create a 'regression pack' of outputs which would be ran on a continuous integration basis (again using testthat), ensuring that future developments only result in differences where expected.

  2. As 1 but a separate implementation of the algorithm is carried out by an independent developer and the outputs from that implementation are used to test the main code.

The concern with 1 is that having a third party read and review the code (whether original algorithm and/or tests) doesn't give sufficient comfort in the validation of the algorithm.

The concern with 2 is that a separate implementation of the algorithm (potentially for the 3rd time depending on the detail of the unit tests) is wasteful.

Was wondering if any others using R in production had an approach that they used around ensuring confidence in the implementation of algorithms?

Thanks,
Nick

I think that, typically, the "proof is in the pudding." If your algorithm performs appropriately on a brand new validation sample, then potential bugs can be essentially treated the same as modeling failures -- somewhat inevitable, so you must be prepared to deal with them when your model produces nonsense. In combination with #1, that should be reasonable in many cases.

Depending on the complexity of your inputs and what you are modeling, what about applying your algorithm to fake data generated by a known model? This wouldn't really work if generating sufficiently complex fake data is more difficult than modeling it in the first place (such as in image analysis), but it could work in some situations.

On the other hand, if you are talking about something like software used to control an airplane (high risk of harm in case of failure), than at least one independent implementation seems advisable.

It's maybe easier to think of this along the lines of implementations of the models underlying QuantLib (R interface here: https://cran.r-project.org/web/packages/RQuantLib/RQuantLib.pdf).

So, to give a simplified example: under the Vasicek model for interest rates (https://en.wikipedia.org/wiki/Vasicek_model) we can solve a formula for pricing zero coupon bonds (see Theorem 4.4 here: https://web.mst.edu/~bohner/fim-10/fim-chap4.pdf for more details - struggling to work out how to add formula in this post!).

If we were to implement this bond pricing formula in a function, the question is then how do we go about verifying that the implementation of that formula in code is correct.

Whilst some edge case unit tests could be added relatively easily (in this example we might think about when interest rates rise bond prices tend to zero; as interest rates tend to zero, bond prices tend to 1; as maturity increases bond prices tend to zero) it's not possible to just write down the correct output for some typical set of input parameters without actually re-implementing the formula.

Under 1 in the original question, the implementation would be verified by having someone independently review that the code is correctly implementing the mathematical formula.

Under 2 in the original question, the implementation would be verified by having someone independently implement the mathematical formula and then compare outputs.

As the formula becomes more complex, both 1 and 2 become harder / more costly but both have their pros and cons.

I'd be particularly interested in the experience of anyone from the insurance or banking industries, as this scenario must be a common challenge when giving comfort to both internal and external auditors?