Internet resources should fail gracefully

Hello Everyone,

I have a question with the following CRAN policy:

Packages which use Internet resources should fail gracefully with an informative message if the resource is not available (and not give a check warning nor error).

What does this mean for a package?

  • Do I have to catch all errors/warnings and demote them into messages?
library(httr)
tryCatch({
  GET("http://httpbin.org/delay/11", timeout(10))
},
error = function(x) message(conditionMessage(x))
)
#> Timeout was reached: [httpbin.org] Operation timed out after 10001 milliseconds with 0 bytes received
  • Does it only affect the testing side of the package? Use something like that :
test_that("function works", {
  skip_on_cran()
 <expectation>
})

Thanks

I'm not (yet) a package guy, but given the general unfamiliarity with HTTP protocol errors, it's sometimes helpful to expand on 404s 413s, etc. Had a question the other day about a forbidden that was traceable to exceeding API allowance.

That said, it may be overly "dumbing down."

1 Like

Following @technocrat advice, there is some helpers with http error code to give helpul message

In httr, you have httr::stop_for_status(), httr::warn_for_status() and httr::message_for_status(). You also have this helpful package fauxpas which also help to choose the behavior. This is helpful for your code as CHECK for tests or vignettes

About the checks, I think this is a demand to not have some WARNINGS or ERRORS notes from you CRAN check just because a resource is not found.

You could indeed skip the test on CRAN with skip_on_cran or use a trycatch to ends more gracefully.

3 Likes

My understanding is also that the aim is to avoid false alarms in CI workflows, both on CRAN and other repositories.

I have a package that serves a largish object that has to be downloaded from the internet (the package would not fit CRAN size limit if the object were to be internal).

I have resolved the requirement by first testing via httr::http_error() - it tries only HEAD, so very little overhead - and only if the HEAD is OK I proceed to download & serve the object via curl::curl_download().

In past versions (the CRAN requirement is a newish one, I believe it was introduced a year or so ago) I had a stop() in place; I have demoted it to message() and return(NULL) after I learned about the policy change.

if (httr::http_error(remote_file)) { # network is down = message (not an error anymore)
      message("No internet connection or data source broken.")
      return(NULL)
    } else { # network is up = proceed to download via curl
      message("RCzechia: downloading remote dataset.")
      curl::curl_download(url = remote_file, destfile = local_file, quiet = T)
    } # /if - network up or down
4 Likes

Hi,

It is also my understanding the code should fail gracefully and not the tests because this would create a disincentive for more tests.

Very helpful code chunk. So instead of a tryCatch you are using a http_error, so testing and particularly expect_error(fun) would make much more sense now.

  • However, timeout problem may still arise when you are accessing the whole file, and not just the head. Do you have any insight on that?

Thanks a lot

1 Like

You are welcome!

To elaborate a little on my post: I actually have the if clause a bit more complex, as I test for either

  1. the http_error (= network actually being down) or
  2. an environment variable simulating the network being down.

This way I can have a test_that expectation that tests the evaluation of the condition of a network failure (it was sort of difficult to fake a network being down in a remote CI workflow, such as Travis or CRAN, and this was the best I could think of).

This is the code that I actually use, including the part from unit tests (it is not an expect_error(), but expect_message() - an error would not be considered as a graceful fail :))

As for the timeout error I have left this to {curl} to handle, which may not be an option in your case.

# in my function init:
network <- as.logical(Sys.getenv("NETWORK_UP", unset = TRUE)) # dummy variable to allow testing of network

# actual evaluation 
if (httr::http_error(remote_file) | !network) { # network is down = message (not an error anymore)
   message("No internet connection or data source broken.")
   return(NULL)
} else { # network is up = proceed to download via curl
   message("RCzechia: downloading remote dataset.")
   curl::curl_download(url = remote_file, destfile = local_file, quiet = T)
} # /if - network up or down
 
# in my tests
Sys.setenv("NETWORK_UP" = FALSE)
expect_message(chr_uzemi(), "internet") # message about faulty internet connection
Sys.setenv("NETWORK_UP" = TRUE)
2 Likes

Thank you, very helpful to know that you can simulate a no-network situation for testing.

I was thinking something similar like curl::has_internet() to check if the network is down.
However, I would split them like that.

if (curl::has_internet()) {
   message("No internet connection")
   return(NULL)
}
if (httr::http_error(remote_file)) {
   message("Data source broken.")
   return(NULL)
}

I think httr::http_error() would throw an error if there is not internet connection, right?

Food for thought here!

Yes, this makes perfect sense - in fact more sense than my original solution, as httr::http_error() will throw an error (and not the desired message) in case of a non working internet (I have just checked).

You mind if I steal your approach? I promise to modify it slightly though, to check for failed internet (working is ok...)

Thanks a lot!

if (!curl::has_internet()) {
   message("No internet connection")
   return(NULL)
}

Yeah definitely, take the whole thing.

OK, copied & pasted :slight_smile:

For testting, you have a testthat helper to skip the test if offline. See testthat::skip_if_offline(host = "r-project.org")

I use it to check connectivity for a host before running the test code.

3 Likes

Well, I gave it some thought, and this is the best I come up with to conform with CRAN policy.

library(httr)
library(curl)

gracefully_fail <- function(remote_file) {
  try_GET <- function(x, ...) {
    tryCatch(
      GET(url = x, timeout(1), ...),
      error = function(e) conditionMessage(e),
      warning = function(w) conditionMessage(w)
    )
  }
  is_response <- function(x) {
    class(x) == "response"
  }
  
  # First check internet connection
  if (!curl::has_internet()) {
    message("No internet connection.")
    return(invisible(NULL))
  }
  # Then try for timeout problems
  resp <- try_GET(remote_file)
  if (!is_response(resp)) {
    message(resp)
    return(invisible(NULL))
  }
  # Then stop if status > 400
  if (httr::http_error(resp)) { 
    message_for_status(resp)
    return(invisible(NULL))
  }
  
  # If you are using rvest as I do you can easily read_html in the response
  xml2::read_html(resp)
}

gracefully_fail("http://httpbin.org/status/404") # http >400
#> Not Found (HTTP 404).
gracefully_fail("http://httpbin.org/delay/2") # Timeout
#> Timeout was reached: [httpbin.org] Operation timed out after 1000 milliseconds with 0 bytes received
gracefully_fail("http://httpbin.org") #OK
#> {html_document}
#> <html lang="en">
#> [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8 ...
#> [2] <body>\n    <a href="https://github.com/requests/httpbin" class="github-c ...

Of course, in a package I would change timeout(1) with timeout(10).

7 Likes

This is fantastic. I need to revamp the way some of my packages work and this is just what I needed.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.