The standard method of distributing data with your package is in R/sysdata.R
. If the dataset is sufficiently large, that will cause problems with CRAN. Ideally, it would be great if the package could download data silently as install time, but I don't think that is possible. Is there a good way to distribute package data, but still comply with CRAN guidelines?
For example, rnaturalearth which stores some data within a separate package rnaturalearthhires. It then asks to install rnaturalearthhires with devtools::install_github()
. Technically that works, but it's somewhat cumbersome. You may
also end up with two packages out of sync if the data changes over time. You may as well just have only the GitHub package.
Is there a good solution to this problem?