Caching of downloaded datasets

I have a function in a package that needs to download a dataset over the internet (it is not practical to store it internally due to package size considerations / CRAN requirements).

The dataset (it is a shapefile - sf data frame) remains the same each call. While the size is not that huge - megabytes, not gigabytes - it seems a wasteful approach from bandwidth considerations.

Is there a practical way to cache the result, i.e. download once and then store it locally? Ideally an approach that would be OS independent...

Thanks for advice!

2 Likes

Just to be clear, are you asking if there is a way to download the dataset one time only with a function in the package? I think the simplest way would be to add an "if the file doesn't exist" check, in the function, before the download.

download_shapefile <- function(path) {
  if(!file.exists(path)) {
    download.file(url=<your hard coded URL>, destfile=path)
    }
  else(return(paste('Your file is already downloaded at', path)))
}

You would want to add an explanation of that process in the function help, so the user can decide where they want to download the shapefile.

If you are just asking how to download a file locally, the answer is easier :slight_smile:

Yes, that is what I am asking - I am looking for local persistent storage on a user / machine (i.e. higher than project) level.

I have a function that returns a shapefile - class sf object - and has to download it from a remote repository.

The remote download is not an issue, but to speed it up and save bandwidth I am looking for persistent local storage. Having the dataset included with package is not feasible (it would be possible for a GitHub only package, but CRAN has a 5 MB policy).

I would suggest using the rappdirs package for this.

Yes, this is what I was looking for. Thanks for the link!

The CRAN policy of discouraging writing to local filesystem gave me a pause, as I was not aware of it. My bad.

Hopefully it will not be a showstopper with proper explanation / communication / documentation of the local cache.