renv lockfile per rmarkdown doc?

nyoungblut · January 6, 2020, 8:18am

AFAIK, renv creates 1 lockfile for an entire project, but a user's package versions can change over the history of a project's life cycle (eg., older rmd files might have been knitted with older package versions). Wouldn't it help if there was a lockfile per rmd file that the user could update or keep "locked" at certain package versions if needed? Then if the user needs to go back and re-knit an older rmd file, the rmd-associated lockfile will show that the package versions currently installed differ from when that rmd was last knitted. The user could then decide whether to use the new versions or revert to the old versions for the knitting of this file (and then renv::restore afterwards). I believe that this could be done manually now by comparing the sessionInfo versus the project lockfile, but that seems laborious. A per-rmd lockfile could semi-automate this process. Is there anything like this, or is package management in R always at the project level, instead of the per-notebook (per-rmd) level? I'm used to Jupyter Notebooks where each notebook can be run with a separate conda env if the user wants.

cderv · January 6, 2020, 11:01am

This is a very interesting idea!

Currently, I think you can do that using renv. Either by considering each Rmd as project with a local library, or by virtually taking advantage of the renv cache to create when rendering a temporary local library for your document, and have a renv.lock file per document.

After all, this means that each Jupyter Notebook is project - you could then have one renv.lock and one renv library per document, so one renv virtual env per RNotebook - like one conda virtual env per Jupyter Notebook.

I don't think it exists yet a tool for that, but there is close one build upon renv that is call {capsule}

The idea is to be able to execute something (ex. render a Rmd document) inside its capsule with specified package and executing it in its own process. However, I think it is at first project oriented (but again a Rmd analysis is a projec) but I think the tools availables make it easier. Some evolutions could be suggested into capsule also I guess.

here are just my thought for now, but I find the idea very interesting and I think I looked into how it could work.

cderv · January 6, 2020, 1:34pm

Here is what it could look like, using renv to create virtual environment for Rmd document rendering.

Create a lockfile associated with a Rmd file
Render the Rmd using the lockfile, powering renv for a virtual environment for the rendering process.

# reproducible example 
# create a dummy project folder
dir.create(proj_dir <- tempfile("dummy-reprex-proj"))
old <- setwd(proj_dir)
# with a dummy Rmd file
Rmd_file <- "test.Rmd"
xfun::write_utf8(c(
  "---",
  "  title: test",
  "---",
  "",  
  "```{r}",
  "library(glue)",
  "```"
), Rmd_file)

# Creating the lockfile
# using a dummy project dir
dir.create(temp_dir <- tempfile("temp-renv-proj"))
# copy the Rmd file
file.copy(Rmd_file, temp_dir)
#> [1] TRUE
# generate a lockfile associated with this Rmd file
lockfile <- file.path(temp_dir, paste0("renv-", xfun::with_ext(basename(Rmd_file), "lock")))
renv::snapshot(project = temp_dir, lockfile = lockfile, confirm = FALSE)
#> The following package(s) will be updated in the lockfile:
#> 
#> # CRAN ===============================
#> - Rcpp        [* -> 1.0.3]
#> - base64enc   [* -> 0.1-3]
#> - digest      [* -> 0.6.23]
#> - evaluate    [* -> 0.14]
#> - glue        [* -> 1.3.1]
#> - highr       [* -> 0.8]
#> - htmltools   [* -> 0.4.0]
#> - jsonlite    [* -> 1.6]
#> - knitr       [* -> 1.26]
#> - magrittr    [* -> 1.5]
#> - markdown    [* -> 1.1]
#> - mime        [* -> 0.8]
#> - rlang       [* -> 0.4.2]
#> - rmarkdown   [* -> 2.0]
#> - stringi     [* -> 1.4.3]
#> - stringr     [* -> 1.4.0]
#> - tinytex     [* -> 0.18]
#> - xfun        [* -> 0.11]
#> - yaml        [* -> 2.2.0]
#> 
#> # GitHub =============================
#> - renv        [* -> rstudio/renv]
#> 
#> * Lockfile written to 'C:/Users/DERVIE~1/AppData/Local/Temp/RtmpuIH8qz/temp-renv-proj3d54626a4e8a/renv-test.lock'.
# get back the lockfile
file.copy(lockfile, proj_dir)
#> [1] TRUE
list.files(proj_dir)
#> [1] "renv-test.lock" "test.Rmd"
# remove temp renv dir
unlink(temp_dir, recursive = TRUE)

# rendering the Rmd file using the lockfile
dir.create(temp_dir <- tempfile("temp-renv-proj"))
# copy the rmd file and the lockfile
to_copy <- c(rmd = "test.Rmd", lockfile = "renv-test.lock")
file.copy(to_copy, temp_dir)
#> [1] TRUE TRUE
# restore Rmd environment and render
html_file <- withr::with_dir(temp_dir, {
  callr::r(function() {
    renv::init(bare = TRUE, restart = FALSE)
    renv::deactivate()
    renv::restore(lockfile = "renv-test.lock")
    rmarkdown::render("test.Rmd")
  })
})
# get the rendered file
file.copy(html_file, proj_dir)
#> [1] TRUE
# remove temp renv dir
unlink(temp_dir, recursive = TRUE)
# The files has been rendered using its associated lockfile
list.files(proj_dir)
#> [1] "renv-test.lock" "test.html"      "test.Rmd"

# remove reprex stuff
setwd(old)
unlink(proj_dir, recursive = TRUE)

^{Created on 2020-01-06 by the reprex package (v0.3.0)}

This is one way of doing it. I think there is other (using renv::dependencies and renv::hydrate for example)

Also, I think {capsule} could be a good place to host this kind of feature. I'll look into how it could fit and see what the maintener think about this.

I think this kind of functions could be useful to have in a package. What do you think ? I am think of doing a prototype to think about how best do that.
Is this something that could be useful ? What do you think of this example ?
I am curious of your thoughts on this. Thank you !

nyoungblut · January 6, 2020, 4:09pm

Thanks @cderv for the great idea on how to do it! Your example seems like a good way to do it with existing tools. How much time do you think that this would add to the knitting process, even when the package versions don't differ between Rmd files?

One downside of my idea of having a lockfile per Rmd is that it adds to the growing list of files associated with an Rmd. I'm currently struggling with how to organize & name all of the files that I'm producing. I render each Rmd as html, and I keep each md file (for rendering on GitHub). So, for 1 Rmd, I get:

rmardown_files/
├── notebook_files
│   └── figure-html
│       └── unnamed-chunk-7-1.svg
├── notebook.html
├── notebook.md
└── notebook.Rmd

The issues are:

The total number of files could get really large if I have many rmd files (which I will for each research project; a few dozen at least)
The hassle of renaming files if needed (I often have to "refactor" multiple times during the course of a project)

To help with the first issue, I could put each rmd in a separate directory that is named the same as the rmd file, but then if I want to rename the rmd, I'd need to rename the directory too and re-knit the rmd to create re-named associated files or manually rename all associated files. I could just use "notebook" or something else generic for the rmd file name and just have the informative name as the directory (just 1 thing to rename when needed), but then every tab in Rstudio would just be labeled "notebook.Rmd".

What do most people do? There doesn't seem to be any batch renaming method built into RStudio. Another nice thing about Jupyter: 1 file per notebook.

system · January 27, 2020, 4:09pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.