A workflow for research based on renv and docker

I currently supporting a small team of university biologist using R.

The hope is to improve their analysis workflow to make it reproducible and insure their key packages work consistently. Currently, they are working on their own machines, installing R-Studio Desktop, installing packages as needed, files are dispersed across their OS ... it's a mess.

They have a major dependency on Seurat with various scripts they've accumulated requiring different version of Seurat withe additional dependencies on various straight R packages (e.g. 'tidyverse'), recticulate and python.

My goal is to move them to R-Studio running in docker for reproducibility. I recently have been reading up on renv as a means to insure package consistency when using rocker docker images.

The documentation on renv with docker is a great start but frankly there are some gaps to turning that into a usable workflow for my team. My goal is to make this as fool-proof and simple for the team. They are biologists, not IT experts. Most are only passingly familiar with the command-line.

My hope to create a workflow where

  1. Users can load a docker container with R-Studio and a usable version of Seurat that they require as a basis for their work.
  2. Allow them to add new packages as needed
  3. Create an R-Studio project for their specific analysis
  4. Create a new renv.lock and Dockerfile that goes with that project which when called later allows their analysis to run without a hitch.

To date I have the following:

A docker-compose .yml to make loading as painless as possible:

version: '3'

services:
   r_seurat:
     image: "aforsythe/r_seurat:dev"
     build:
       context: .
       dockerfile: Dockerfile
     volumes:
       - "~/r_data/:/home/rstudio/"
       - "~/.renv_docker/cache:/.renv/cache"
     ports:
       - "8787:8787"
     environment:
       - 'DISABLE_AUTH=true'
     restart: always

A Dockerfile :

FROM rocker/verse:4.0.4

ENV PATH=/root/miniconda3/bin:${PATH}
ENV RENV_PATHS_CACHE=/renv/cache
ENV RETICULATE_PYTHON=/root/miniconda3/bin/python
ENV RENV_VERSION=0.13.1

RUN wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
	bash Miniconda3-latest-Linux-x86_64.sh -b && \
	rm Miniconda3-latest-Linux-x86_64.sh && \
	conda update -y conda && \
	conda list && \
	conda install -y numpy \
                     matplotlib \
                     pandas

RUN R -e "install.packages('remotes', repos = c(CRAN = 'https://cloud.r-project.org'))"
RUN R -e "remotes::install_github('rstudio/renv@${RENV_VERSION}')"

COPY ./renv.lock /renv/tmp/renv.lock

WORKDIR /renv/tmp

RUN R -e "renv::restore()"

WORKDIR /home/rstudio

And an revn.lock file .. (in gist due to space constraints and not linked due to link limitations in forum)
https://gist.github.com/aforsythe/71fd5981d3d50066605b585fdc021b74

My question is, how can I use / modify what I have so far to accomplish the workflow I've outlined above.

I imagine with workflow going something like this :

  1. user clones a repo with the Dockerfile, docker-compose.yml, and renv.lock
  2. user runs docker-compose up -d
  3. user visits https://localhost:8787 in browser
  4. user runs a small script (yet to be developed) to generate a template of directories and subdirectories (e.g. analysis_code, data, data_clean, figures, etc.) which would be created in a project named subdirectory of /home/rstudio/
  5. user creates code, loading (eg. library(ggplot2)) packages as necessary with option to install new packages as needed
  6. user "saves" project by running renv::init() and a script to create a docker file such than when they revisit the project they load a container based on that project specific Dockerfile and renv.lock

Perhaps I'm off base in my expectations. I'm just looking for the easiest and most straight forward way to create a workflow for people who's job isn't managing their R worlds. They just need tools that work and work with minimal interaction.

Any help would be greatly appreciated.

Hoping @kevinushey my have some insight.

Sorry for taking so long to respond! Here are my thoughts...

If I understand correctly, you want to create some Docker images with a pre-fabricated environment + RStudio Server, and allow users to run connect to and run those containers. Preferably, with users running Docker on their own machines.

My main concern is that asking users to run Docker containers on their own machines may be a big ask. Is it possible to have those images + containers managed separately, so that users only need to connect to RSP via some URL in their browser?

I believe it would be best to have the R + RStudio infrastructure managed independently of the user's machines; ideally, users should only need to connect to RStudio Server via the browser and everything would be handled appropriately.

It's worth saying that the Job Launcher in RStudio Server Pro makes it possible to launch R sessions backed by arbitrary Docker images / containers: Launcher Overview - RStudio :: Solutions. Those containers can either be local, or running on some other external container infrastructure. I'm not sure if that's an option in your case, though.

@kevinushey

Thanks ... We unfortunately don't have infrastructure to remotely host. I'm not really looking to be a sys-admin per-say for R, but rather my primary goal is to get users into a well controlled environment so they change reproduce and share their analyses. Right now they are using R-Studio desktop, primarily on their Macs, with files on the Desktop, packages changing on a daily basis, etc. It's a mess. I'm the guy who "knows how to fix things" so I get constant calls to "help make things work."

My thought around using Docker was to introduce some consistency to their environment. Obviously renv is an important part of that, but that's only part of the solution. I've moved substantially past where I was when I originally wrote this post. I'm documenting the workflow now and will share the github repo for thoughts soon, if you wouldn't mind giving them.

@kevinushey I'm sure there are improvements that could be made, but here's what I've got so far ...

Happy to accept suggestions and/or pull requests :wink:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.