Vignettes that require locally stored data

cran
vignette

#1

I am working on a package where much of the functionality involves interacting with an sqlite database. For testing I have a small version of the database that is contained directly within the package. But when creating the vignettes I have used the full sqlite database so that I can fully illustrate how functions work etc. From Hadley’s R Packages book in the vignettes chapter:

Note that since you build vignettes locally, CRAN only receives the html/pdf and the source code. However, CRAN does not re-build the vignette. It only checks that the code is runnable (by running it). This means that any packages used by the vignette must be declared in the DESCRIPTION. But this also means that you can use Rmarkdown (which uses pandoc) even though CRAN doesn’t have pandoc installed.

Does this mean that for my vignette to be accepted by CRAN it would need access to the full sqlite database or does “runnable” by chance mean that the code is somehow clean (doubtful). If the vignette does needs completely buildable (which in this case requires the full sqlite database) to be accepted by CRAN what are some strategies for this type of thing here? I can think of two possibilities:

  1. Run the vignette off the small internal sqlite database but have code that sets that chunk to echo=FALSE, eval=TRUE then have another chunk that the user actually sees where echo=TRUE, eval=FALSE so that the user see the code that accesses the full database (that they would have downloaded locally)?
  2. Wrapped all the function calls in the vignette in something like if(file.exist(sqlite database) then run the function…

Any there any other strategies here or am I misunderstanding how CRAN accepts vignettes?

Thanks in advance,

Sam


#2

i would be also interested to see what approaches would work for your question.
just an idea, not sure if this would be a way for you but you could in principle save the data as an .rds file with high compression, include it as internal data in your pkg then load it in the database for the vignettes ? not sure if this is a good way - just an idea…


#3

I think this prior thread is relevant to your question:

Basically, it looks like the “best” option is a hybrid of your two options, where you use an environmental variable to determine whether eval is TRUE or FALSE.


#4

Yep, sounds very similar to what I wanted. Here’s what I ended up with:


#5

These are great solutions. Curious @gergness - you didn’t end up using the opts_chunk$set(eval = nzchar(Sys.getenv("MY_ENV_VARIABLE"))) approach but rather a conditional with file.exists. Any reason why?


#6

Mostly because I’m not very familiar with the environmental variable approach, so possibly everything I wanted could have worked with that method.

I wanted users/developers of the package (and Travis CI) to be able to get the data for the vignettes using only a devtools::install_github() command (but I didn’t want to add the package to the suggests field to avoid a note from CRAN, so I didn’t want to use require()). By checking for the system files of the data package directly, I avoid notes about missing dependencies, but also can be sure that the code is only run when the data is available.

I’m curious to see if there’s a better approach though, let me know what you end up doing.


#7

@gergness Apologies for the late reply.

I ended up using this approach:

  • Define a variable in my .Renviron file like this: hydat_eval = TRUE
  • Then as part of my global options I can check for the existence of hydat_eval along with my other global variables:
knitr::opts_chunk$set(echo = TRUE, 
                      warning = FALSE, 
                      message = FALSE, 
                      eval = nzchar(Sys.getenv("hydat_eval")),
                      fig.width=7, fig.height=7)

I think this will work because cran only looks to see if the vignettes will run. The actual .html files are generated locally by me with that variable set in my .Renviron file. When cran tries to run the vignette it will just skip trying to run all the code because it does not have that variable set.

Here it is in action: https://github.com/bcgov/tidyhydat/blob/master/vignettes/tidyhydat_an_introduction.Rmd

Thank you for the input.