storing example datasets for package vignettes

I would like to use an example dataset in my package vignette. I don't want to include it in the package since it's not essential and CRAN does not like packages that are over 10MB. It would be less than 100MB, so downloading it would not be terribly slow. One simple solution is to just create another GitHub repo and put it there. It's not the correct way to use GitHub, but it would technically work. Is there a more appropriate/acceptable solution?


I'll briefly mention two strategies.


See this R Journal article about posting the data in a package on drat, which you can mention in your DESCRIPTION file. "This paper describes how R users can create a suite of coordinated packages, in which larger data packages are hosted in an alternative repository created with drat, while a smaller code package that interacts with this data is created that can be submitted to CRAN."

Package somewhere else e.g. GitHub

Depending on the solution you choose if not the one from the paper (some inspiration in a R-hub blog post), CRAN would not install the package (and your vignette code should not download data without asking the user), so you'd also need to make sure R CMD check does not fail. I wrote a summary in a R-hub blog post

  • you could pre-compute the vignette;

  • you could use the purl and eval chunk options;

  • you could make the vignette an article instead (present in the pkgdown website, not present for R CMD check).


Thank you for such a detailed response. I did not realize the answer would be so nuanced. Having thought about it for a while, do you have a preference for which path to take? They all seem to have tradeoffs.


Ah! No, my answer would definitely be "it depends" :grin: :see_no_evil:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.