How to handle data dependencies?

In short, my question is: when running an existing RMarkdown file that was created on another system, I am notified of R package dependencies and provided with an option to install missing packages. Is there a similar process for detecting and resolving data dependencies?

Some background: I am new to R and RMarkdown and attracted to the latter's potential to increase the reproducibility of published research. After exploring some tutorial materials, I wanted to see how RMarkdown is used in the wild. I thus went to medRxiv and searched it for RMarkdown, which yielded 11 results, and the one I tried gave me this issue. It was easily resolved manually by re-running the RMarkdown in a folder containing the relevant data files, but I am wondering whether such data dependencies could be handled in a more (re)user friendly fashion, especially given that it already works smoothly for code dependencies (at least as long as they are in R).

There's an R-centric organization working in this area. My own take is that it depends on the publication forum and domain expectations.

In general, there's a spectrum ranging from on-request to full-service. In olden days, authors would sometimes respond to unsolicited requests from peers or even provide a mailing address for the purpose. Today it may be posted, but link rot makes the data ephemeral unless hosted by a persistent administrator through a scheme such as doi.

Approaches at the alternative end are creating data objects in a bespoke package (which can do as little as providing a container for the object) on CRAN or github (still ephemeral). A package with all the scripts and data used is a similar approach. More helpful yet is a Docker image with the complete R version and accompanying base and contributed packages used, as well as the data. Or you could put that along with an open-source operating system version used in an ISO or provide a bootable SSD with everything.

From a practicality standpoint, however, I'd put the data in a git repository in csv form so it can be conveniently brought in with readr::read_csv as a tibble.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.