Is it allowed/okay practice to make API requests on package load?

I'm starting work on my first "real" R package and it's attempting to be a client for a website API. One problem with the API is that the allowable set of parameters to the query is not super obvious; to deal with this the API exposes endpoints that tell you the allowable set of parameters (in case they change, for example). I'd like to be able to fetch the allowable set of parameters both so the user can call a function to see what they can enter, as well as so that I can do some input checking on the actual function that sends the query.

My first thought was to provide a function something like this:

show_params <- function() {
  httr::GET(url) %>%
    jsonlite::fromJSON(content(.))
}

but this would call the API every time it was used. Another option I thought about was to do something like this:

params <- httr::GET(url) %>%
    jsonlite::fromJSON(content(.))
show_params <- function() {
  params
}

but then, as I understand it, the API would be hit when the package is attached? loaded? and I am not sure if this would be a bad idea.

I am currently not sure how to weigh having more API requests vs caching the results of a request for the rest of the package and for the user. Any thoughts?

Take a look at tidycensus::load_variables(), which allows a user to load available variables from the census API to a data frame. It has a cache option so that users only have to download it once.

1 Like

You can have a refresh_params = FALSE argument. When TRUE it uses options to set whatever needs to be set. If FALSE and the relevant options are NULL, refresh anyway.

If you aim to release your package to CRAN eventually you should be aware that the CRAN Repository Policy discourages you from caching to end user's file system; TMPDIR is OK, but by definition temporary.

The policy also says you should use only secure (https / ftps) connection.