Data Analysis Code Review


#1

Yes, this is going to end with me asking if anyone wants to review the code I am planning to publish with my next manuscript.

But first,

  1. What is the best way to go about finding some one to review your code? This seems like a really huge favor to ask! I'm curious how other people find reviewers, or what motivates people to step up and review.

  2. Most of what I have read about code review is written for developers. Does anyone know of any good guides for reviewing code for data analysis?

My code consists of pulling soil, weather, and nitrous oxide data from a few different sources then building a predictive model to predict nitrous oxide emissions based off of soil and weather variables. I am planning to submit it as a Reply in Nature. It is a response to an analysis/model that needed to be improved. I think all the code will be ready for review by next week (9/12). I would be eternally grateful if anyone wants to take a look at what I've got.


#2

A lot of analysts have colleagues who can review code. Even if they have no R skills, you could walk them through the program and explain the purpose of each code chunk (btw, this is a great way to organize scripts and write documentation).

But an online community of data analysts who review one another's projects would be useful. You might want to pitch the idea to rOpenSci or the R Consortium

There's not much difference between "general" and "analysis" software, so the review process is similar. And it's possible to bundle a report as a package. Sometimes efficiency isn't a concern, but the other points apply:

  • Does it work?
  • Does the code follow a style, making it easily read?
  • Do the naming and organization help understand what's being done?
  • Are there directions for running/using it?
  • Do you bundle repeated actions inside functions? Use similar objects (ideally with well-defined structure) to hold similar data?

I would recommend Workflow of Statistical Data Analysis by Oliver Kirchkamp. It's a lengthy guide to writing analysis reports using R. It doesn't explicitly address peer review, but does give opinions on how to code.


#3

This is a fantastic resource that I haven't see before -- thank you for sharing!


#4

Thanks for the resources and the nice reply! Kirchkamp looks like a very good reference.

I'd like to keep the question open. I've read quite a bit on "how to code", but just because the code looks right and works, doesn't mean it wouldn't benefit from a second set of eyes. I can't imagine ever submitting text without having some one look over it.

Still curious if there are more opinions or experiences out there related to code review.


#5

I don't have great answers (sorry!); this is a hard topic for data analysts that so far seems to be solved by individual lab groups, eg (if "solved" at all). But I wanted to point out that there's a very related ongoing discussion happening at the rOpenSci discussion board, with the goal to plan a community call on the topic relatively soon.