Vignettes that require locally stored data

I am working on a package where much of the functionality involves interacting with an sqlite database. For testing I have a small version of the database that is contained directly within the package. But when creating the vignettes I have used the full sqlite database so that I can fully illustrate how functions work etc. From Hadley's R Packages book in the vignettes chapter:

Note that since you build vignettes locally, CRAN only receives the html/pdf and the source code. However, CRAN does not re-build the vignette. It only checks that the code is runnable (by running it). This means that any packages used by the vignette must be declared in the DESCRIPTION. But this also means that you can use Rmarkdown (which uses pandoc) even though CRAN doesn’t have pandoc installed.

Does this mean that for my vignette to be accepted by CRAN it would need access to the full sqlite database or does "runnable" by chance mean that the code is somehow clean (doubtful). If the vignette does needs completely buildable (which in this case requires the full sqlite database) to be accepted by CRAN what are some strategies for this type of thing here? I can think of two possibilities:

  1. Run the vignette off the small internal sqlite database but have code that sets that chunk to echo=FALSE, eval=TRUE then have another chunk that the user actually sees where echo=TRUE, eval=FALSE so that the user see the code that accesses the full database (that they would have downloaded locally)?
  2. Wrapped all the function calls in the vignette in something like if(file.exist(sqlite database) then run the function...

Any there any other strategies here or am I misunderstanding how CRAN accepts vignettes?

Thanks in advance,

Sam

1 Like

i would be also interested to see what approaches would work for your question.
just an idea, not sure if this would be a way for you but you could in principle save the data as an .rds file with high compression, include it as internal data in your pkg then load it in the database for the vignettes ? not sure if this is a good way - just an idea...

I think this prior thread is relevant to your question:

Basically, it looks like the "best" option is a hybrid of your two options, where you use an environmental variable to determine whether eval is TRUE or FALSE.

3 Likes

Yep, sounds very similar to what I wanted. Here's what I ended up with:

1 Like

These are great solutions. Curious @gergness - you didn't end up using the opts_chunk$set(eval = nzchar(Sys.getenv("MY_ENV_VARIABLE"))) approach but rather a conditional with file.exists. Any reason why?

Mostly because I'm not very familiar with the environmental variable approach, so possibly everything I wanted could have worked with that method.

I wanted users/developers of the package (and Travis CI) to be able to get the data for the vignettes using only a devtools::install_github() command (but I didn't want to add the package to the suggests field to avoid a note from CRAN, so I didn't want to use require()). By checking for the system files of the data package directly, I avoid notes about missing dependencies, but also can be sure that the code is only run when the data is available.

I'm curious to see if there's a better approach though, let me know what you end up doing.

@gergness Apologies for the late reply.

I ended up using this approach:

  • Define a variable in my .Renviron file like this: hydat_eval = TRUE
  • Then as part of my global options I can check for the existence of hydat_eval along with my other global variables:
knitr::opts_chunk$set(echo = TRUE, 
                      warning = FALSE, 
                      message = FALSE, 
                      eval = nzchar(Sys.getenv("hydat_eval")),
                      fig.width=7, fig.height=7)

I think this will work because cran only looks to see if the vignettes will run. The actual .html files are generated locally by me with that variable set in my .Renviron file. When cran tries to run the vignette it will just skip trying to run all the code because it does not have that variable set.

Here it is in action: https://github.com/bcgov/tidyhydat/blob/master/vignettes/tidyhydat_an_introduction.Rmd

Thank you for the input.

1 Like

Is this still a viable option? I just got a package that used this strategy pulled from CRAN (that followed this pattern) with the message:

Vignettes are supposed to be self-contained but for these packages it 
seems at least one vignette depends on packages loaded or objects 
generated in others.

This can be seen in the clang-Fedora checks, done with a 
currently-optional customization of R CMD check (see the NEWS file in 
R-devel).

I'm having some issues installing a new version of R-devel to reproduce the error, and thought in the meantime someone here might know the new work-around.

Hi Laura :wave:,

My use case still works (for now) but that might be because the vignette depends on data being downloaded rather than relying packages. Which package got pulled?

Sam

EGRET... unfortunately the note from CRAN came right when the shutdown started, so I wasn't able to do anything to stop it. I'll try a few things today. Worst case, I just take down the offending vignette for now and leave it in the pkgdown site.

1 Like

Not sure whether this is applicable in your situation, but I got around the issue of external data in vignette by downloading a file from network location to a tempfile() at build time.

Obviously works only for small files (tens of kb in my case), but seems to be OK with CRAN.

Taken from vignette from package RCzechia.

GET("https://raw.githubusercontent.com/jlacko/RCzechia/master/data-raw/zvcr034.xls", 
    write_disk(tf <- tempfile(fileext = ".xls")))
## Response [https://raw.githubusercontent.com/jlacko/RCzechia/master/data-raw/zvcr034.xls]
##   Date: 2019-01-03 21:14
##   Status: 200
##   Content-Type: application/octet-stream
##   Size: 44.5 kB
## <ON DISK>  /tmp/Rtmp7QWxv8/file322b22dec355.xls

src <- read_excel(tf, range = "Data!B5:C97") # read in with original column names