What checks are practical when loading data over internet?

I am preparing to release a package that uses readRDS() to read a largish data set over the internet (due to CRAN package size limitations it is not feasible to include it with the package).

I am thinking about what checks should I build in to make loading of the data bulletproof.

I have a warning triggered by http_error() from httr package if the URL does not work (meaning no internet connection, or error on the remote server) - but I am wondering whether I can rely on the readRDS() function to handle data consistency of the downloaded file (things like SHA checks etc.)

I'm not sure how I'd glue the two, but if you can get the download to fetch both the file and its md5 using the digest package, a function could compare the reference hash to a calculated hash and have reasonable confidence that she's fetched an intact data set.

Or, there's blockchain. (Urk!)

Thanks for the ideas @technocrat!

At the end I have solved my problem via the curl package (which is used by httr, so no additional dependency).

curl::curl_download() has proven more reliable than utils::download.file() - especially concerning binary files on Windows platform, which had been devilishly difficult to debug in the past.

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it: