I currently supporting a small team of university biologist using R.
The hope is to improve their analysis workflow to make it reproducible and insure their key packages work consistently. Currently, they are working on their own machines, installing R-Studio Desktop, installing packages as needed, files are dispersed across their OS ... it's a mess.
They have a major dependency on Seurat with various scripts they've accumulated requiring different version of Seurat withe additional dependencies on various straight R packages (e.g. 'tidyverse'), recticulate and python.
My goal is to move them to R-Studio running in docker for reproducibility. I recently have been reading up on renv as a means to insure package consistency when using rocker docker images.
The documentation on renv with docker is a great start but frankly there are some gaps to turning that into a usable workflow for my team. My goal is to make this as fool-proof and simple for the team. They are biologists, not IT experts. Most are only passingly familiar with the command-line.
My hope to create a workflow where
- Users can load a docker container with R-Studio and a usable version of
Seurat that they require as a basis for their work.
- Allow them to add new packages as needed
- Create an R-Studio project for their specific analysis
- Create a new
renv.lock and Dockerfile that goes with that project which when called later allows their analysis to run without a hitch.
To date I have the following:
A docker-compose .yml to make loading as painless as possible:
version: '3'
services:
r_seurat:
image: "aforsythe/r_seurat:dev"
build:
context: .
dockerfile: Dockerfile
volumes:
- "~/r_data/:/home/rstudio/"
- "~/.renv_docker/cache:/.renv/cache"
ports:
- "8787:8787"
environment:
- 'DISABLE_AUTH=true'
restart: always
A Dockerfile :
FROM rocker/verse:4.0.4
ENV PATH=/root/miniconda3/bin:${PATH}
ENV RENV_PATHS_CACHE=/renv/cache
ENV RETICULATE_PYTHON=/root/miniconda3/bin/python
ENV RENV_VERSION=0.13.1
RUN wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
bash Miniconda3-latest-Linux-x86_64.sh -b && \
rm Miniconda3-latest-Linux-x86_64.sh && \
conda update -y conda && \
conda list && \
conda install -y numpy \
matplotlib \
pandas
RUN R -e "install.packages('remotes', repos = c(CRAN = 'https://cloud.r-project.org'))"
RUN R -e "remotes::install_github('rstudio/renv@${RENV_VERSION}')"
COPY ./renv.lock /renv/tmp/renv.lock
WORKDIR /renv/tmp
RUN R -e "renv::restore()"
WORKDIR /home/rstudio
And an revn.lock file .. (in gist due to space constraints and not linked due to link limitations in forum)
https://gist.github.com/aforsythe/71fd5981d3d50066605b585fdc021b74
My question is, how can I use / modify what I have so far to accomplish the workflow I've outlined above.
I imagine with workflow going something like this :
- user clones a repo with the
Dockerfile, docker-compose.yml, and renv.lock
- user runs
docker-compose up -d
- user visits
https://localhost:8787 in browser
- user runs a small script (yet to be developed) to generate a template of directories and subdirectories (e.g.
analysis_code, data, data_clean, figures, etc.) which would be created in a project named subdirectory of /home/rstudio/
- user creates code, loading (eg.
library(ggplot2)) packages as necessary with option to install new packages as needed
- user "saves" project by running
renv::init() and a script to create a docker file such than when they revisit the project they load a container based on that project specific Dockerfile and renv.lock
Perhaps I'm off base in my expectations. I'm just looking for the easiest and most straight forward way to create a workflow for people who's job isn't managing their R worlds. They just need tools that work and work with minimal interaction.
Any help would be greatly appreciated.
Hoping @kevinushey my have some insight.