Packages which use Internet resources should fail gracefully with an informative message if the resource is not available (and not give a check warning nor error).
What does this mean for a package?
Do I have to catch all errors/warnings and demote them into messages?
library(httr)
tryCatch({
GET("http://httpbin.org/delay/11", timeout(10))
},
error = function(x) message(conditionMessage(x))
)
#> Timeout was reached: [httpbin.org] Operation timed out after 10001 milliseconds with 0 bytes received
Does it only affect the testing side of the package? Use something like that :
I'm not (yet) a package guy, but given the general unfamiliarity with HTTP protocol errors, it's sometimes helpful to expand on 404s 413s, etc. Had a question the other day about a forbidden that was traceable to exceeding API allowance.
Following @technocrat advice, there is some helpers with http error code to give helpul message
In httr, you have httr::stop_for_status(), httr::warn_for_status() and httr::message_for_status(). You also have this helpful package fauxpas which also help to choose the behavior. This is helpful for your code as CHECK for tests or vignettes
About the checks, I think this is a demand to not have some WARNINGS or ERRORS notes from you CRAN check just because a resource is not found.
You could indeed skip the test on CRAN with skip_on_cran or use a trycatch to ends more gracefully.
My understanding is also that the aim is to avoid false alarms in CI workflows, both on CRAN and other repositories.
I have a package that serves a largish object that has to be downloaded from the internet (the package would not fit CRAN size limit if the object were to be internal).
I have resolved the requirement by first testing via httr::http_error() - it tries only HEAD, so very little overhead - and only if the HEAD is OK I proceed to download & serve the object via curl::curl_download().
In past versions (the CRAN requirement is a newish one, I believe it was introduced a year or so ago) I had a stop() in place; I have demoted it to message() and return(NULL) after I learned about the policy change.
if (httr::http_error(remote_file)) { # network is down = message (not an error anymore)
message("No internet connection or data source broken.")
return(NULL)
} else { # network is up = proceed to download via curl
message("RCzechia: downloading remote dataset.")
curl::curl_download(url = remote_file, destfile = local_file, quiet = T)
} # /if - network up or down
It is also my understanding the code should fail gracefully and not the tests because this would create a disincentive for more tests.
Very helpful code chunk. So instead of a tryCatch you are using a http_error, so testing and particularly expect_error(fun) would make much more sense now.
However, timeout problem may still arise when you are accessing the whole file, and not just the head. Do you have any insight on that?
To elaborate a little on my post: I actually have the if clause a bit more complex, as I test for either
the http_error (= network actually being down) or
an environment variable simulating the network being down.
This way I can have a test_that expectation that tests the evaluation of the condition of a network failure (it was sort of difficult to fake a network being down in a remote CI workflow, such as Travis or CRAN, and this was the best I could think of).
This is the code that I actually use, including the part from unit tests (it is not an expect_error(), but expect_message() - an error would not be considered as a graceful fail :))
As for the timeout error I have left this to {curl} to handle, which may not be an option in your case.
# in my function init:
network <- as.logical(Sys.getenv("NETWORK_UP", unset = TRUE)) # dummy variable to allow testing of network
# actual evaluation
if (httr::http_error(remote_file) | !network) { # network is down = message (not an error anymore)
message("No internet connection or data source broken.")
return(NULL)
} else { # network is up = proceed to download via curl
message("RCzechia: downloading remote dataset.")
curl::curl_download(url = remote_file, destfile = local_file, quiet = T)
} # /if - network up or down
# in my tests
Sys.setenv("NETWORK_UP" = FALSE)
expect_message(chr_uzemi(), "internet") # message about faulty internet connection
Sys.setenv("NETWORK_UP" = TRUE)
Yes, this makes perfect sense - in fact more sense than my original solution, as httr::http_error() will throw an error (and not the desired message) in case of a non working internet (I have just checked).
You mind if I steal your approach? I promise to modify it slightly though, to check for failed internet (working is ok...)
Thanks a lot!
if (!curl::has_internet()) {
message("No internet connection")
return(NULL)
}
Well, I gave it some thought, and this is the best I come up with to conform with CRAN policy.
library(httr)
library(curl)
gracefully_fail <- function(remote_file) {
try_GET <- function(x, ...) {
tryCatch(
GET(url = x, timeout(1), ...),
error = function(e) conditionMessage(e),
warning = function(w) conditionMessage(w)
)
}
is_response <- function(x) {
class(x) == "response"
}
# First check internet connection
if (!curl::has_internet()) {
message("No internet connection.")
return(invisible(NULL))
}
# Then try for timeout problems
resp <- try_GET(remote_file)
if (!is_response(resp)) {
message(resp)
return(invisible(NULL))
}
# Then stop if status > 400
if (httr::http_error(resp)) {
message_for_status(resp)
return(invisible(NULL))
}
# If you are using rvest as I do you can easily read_html in the response
xml2::read_html(resp)
}
gracefully_fail("http://httpbin.org/status/404") # http >400
#> Not Found (HTTP 404).
gracefully_fail("http://httpbin.org/delay/2") # Timeout
#> Timeout was reached: [httpbin.org] Operation timed out after 1000 milliseconds with 0 bytes received
gracefully_fail("http://httpbin.org") #OK
#> {html_document}
#> <html lang="en">
#> [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8 ...
#> [2] <body>\n <a href="https://github.com/requests/httpbin" class="github-c ...
Of course, in a package I would change timeout(1) with timeout(10).