Dependency installs happening twice with Docker

Because R packages all have to install from source and do a lot of compilation, I find testing and deploying my internal package in Docker is sometimes really slow. A few months ago I realized I can just make a "base" R Docker container that does all of the work of installing the common dependencies that our internal packages have. Then the packages themselves can pull from an existing built container and just have to copy code over.

This was working great, but I noticed in the last few weeks that installing my package within the Docker container is reinstalling R packages.

I'm hoping for some help here-- it makes testing and deploying go from <5 mins to > 20 mins. I am wondering if it's possible that because docker runs as root that the packages are being installed in the wrong user or something?

Here's some sample code to get a feel for what's happening. The Dockerfile of the base container:

FROM rocker/r-ver:3.4.1

ARG BUILD_DATE
ENV BUILD_DATE=2017-07-15
ENV MRAN https://mran.microsoft.com/snapshot/$BUILD_DATE

RUN apt-get update -qq && apt-get -y --no-install-recommends install \
  libxml2-dev \
  libcairo2-dev \
  libsqlite-dev \
  libpq-dev \
  libicu-dev \
  libbz2-dev \
  liblzma-dev \
  default-jdk \
  libssl-dev \
  libcurl4-openssl-dev \
  vim \
  unixodbc \ 
  unixodbc-dev \
  odbc-postgresql \
  && R CMD javareconf \
  && . /etc/environment

RUN  Rscript -e "install.packages(c('dplyr', 'tidyr', 'dbplyr', 'devtools', 'openxlsx', 'RJDBC', 'data.table', 'dtplyr', 'yaml', 'knitr', 'rmarkdown', 'ggplot2', 'data.tree', 'slackr', 'testthat', 'roxygen2', 'assertr', 'purrr', 'futile.logger', 'magrittr', 'odbc', 'feather'), repos = Sys.getenv('MRAN'), Ncpus = 2)"

Then to run my tests I run a script in my container:

Rscript -e "library(devtools);library(methods);library(testthat);install();library($MY_PACKAGE);test_package('$MY_PACKAGE', reporter = 'Summary')

The install step here re-installs all of the dependencies. I also cannot exclude the install step, even though creating the actual docker container should have installed my package:

FROM allovue/rbase:development

RUN mkdir -p /var/$MY_PACKAGE


# Copy my code to the docker container
ADD . /var/$MY_PACKAGE

# Build and install package
RUN cd /var/$MY_PACKAGE && \
  R CMD build . --no-build-vignettes && \
  R CMD INSTALL *tar.gz

WORKDIR /var/$MY_PACKAGE

Any ideas?

Have you tried doing simply something like this, but with your MRAN setting:

RUN R -e "install.packages('gapminder', repos = 'http://cran.us.r-project.org')"

Maybe setting the dependencies =FALSE argument in the code above could do the trick?

I pulled this code from: http://ropenscilabs.github.io/r-docker-tutorial/05-dockerfiles.html, so apologies if this is too basic of an answer.

I'm not exactly sure how this differs from your code, but I would presume that it may be a much simpler way of installing the packages into your container? What is the purpose of your test scripts?

I'm very interested in trying to create an RStudio container using makefiles, and I'm not sure why you have an install on the following line:

Not 100% positive what the problem is, but order of instructions is important.

For example if you place an ADD instruction towards the top above a few RUN instructions and anything in the added file or directory has changed, all subsequent layers below, their cache is automatically invalidated regardless. Sometimes a RUN instruction has to go after an ADD, but that is not what I’m referring to.

Check out this post for more details:

I did not believe install() would be needed but when I run without it I get the error that tests around found. I’m assuming that’s because R CMD install removes tests.

But that does give me an idea that possibly I could use load_all instead of install.

It would be useful to see the output log when building the container. Also the output of library() without any arguments in the container, as well as .libPaths().

Building the second container looks like this:

Status: Downloaded newer image for allovue/rbase:development
 ---> 2734d88fc76b
Step 2 : RUN mkdir -p /var/$MY_PACKAGE
 ---> Running in bb95d5689702
 ---> 6922a2ab5ed8
Step 3 : VOLUME /var/importer/files
 ---> Running in 9fb799fff6f1
 ---> e452546d7ddc
Step 4 : ADD . /var/$MY_PACKAGE
 ---> af9c752e64dc
Step 5 : RUN cd /var/$MY_PACKAGE &&   R CMD build . --no-build-vignettes &&   R CMD INSTALL *tar.gz
 ---> Running in af957fdab0ec
* checking for file ‘./DESCRIPTION’ ... OK
* preparing ‘extractor’:
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files
* checking for empty or unneeded directories
* building ‘$MY_PACKAGE_0.6.tar.gz’

* installing to library ‘/usr/local/lib/R/site-library’
* installing *source* package ‘$MY_PACKAGE’ ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE ($MY_PACKAGE)
 ---> 4b9ccbd1bb55
Step 6 : WORKDIR /var/$MY_PACKAGE
 ---> Running in 9c929742a4be
 ---> f770b502837a
Successfully built f770b502837a

DockerHub is ca crumby website that doesn't let me download the logs for the base image, but it all works successfully and looks like normal package installations. The packages appear to be installing in /usr/local/lib/R/site-library/ when that is specified.

Afterwards (in the base container), this is the result of library() and .libPaths():

> .libPaths()
[1] "/usr/local/lib/R/site-library" "/usr/local/lib/R/library"
> library()
Packages in library ‘/usr/local/lib/R/site-library’:

assertr                 Assertive Programming for R Analysis Pipelines
assertthat              Easy Pre and Post Assertions
backports               Reimplementations of Functions Introduced Since
                        R-3.0.0
base64enc               Tools for base64 encoding
BH                      Boost C++ Header Files
bindr                   Parametrized Active Bindings
bindrcpp                An 'Rcpp' Interface to Active Bindings
bit                     A class for vectors of 1-bit booleans
bit64                   A S3 Class for Vectors of 64bit Integers
bitops                  Bitwise Operations
blob                    A Simple S3 Class for Representing Vectors of
                        Binary Data ('BLOBS')
brew                    Templating Framework for Report Generation
caTools                 Tools: moving window statistics, GIF, Base64,
                        ROC AUC, etc.
cluster                 "Finding Groups in Data": Cluster Analysis
                        Extended Rousseeuw et al.
codetools               Code Analysis Tools for R
colorspace              Color Space Manipulation
commonmark              High Performance CommonMark and Github Markdown
                        Rendering in R
crayon                  Colored Terminal Output
curl                    A Modern and Flexible Web Client for R
data.table              Extension of `data.frame`
data.tree               General Purpose Hierarchical Data Structure
DBI                     R Database Interface
dbplyr                  A 'dplyr' Back End for Databases
desc                    Manipulate DESCRIPTION Files
devtools                Tools to Make Developing R Packages Easier
DiagrammeR              Create Graph Diagrams and Flowcharts Using R
dichromat               Color Schemes for Dichromats
digest                  Create Compact Hash Digests of R Objects
docopt                  Command-Line Interface Specification Language
doParallel              Foreach Parallel Adaptor for the 'parallel'
                        Package
dplyr                   A Grammar of Data Manipulation
dtplyr                  Data Table Back-End for 'dplyr'
evaluate                Parsing and Evaluation Tools that Provide More
                        Details than the Default
feather                 R Bindings to the Feather 'API'
foreach                 Provides Foreach Looping Construct for R
futile.logger           A Logging Utility for R
futile.options          Futile options management
ggplot2                 Create Elegant Data Visualisations Using the
                        Grammar of Graphics
git2r                   Provides Access to Git Repositories
glue                    Interpreted String Literals
gridBase                Integration of base and grid graphics
gridExtra               Miscellaneous Functions for "Grid" Graphics
gtable                  Arrange 'Grobs' in Tables
highr                   Syntax Highlighting for R Source Code
hms                     Pretty Time of Day
htmltools               Tools for HTML
htmlwidgets             HTML Widgets for R
httr                    Tools for Working with URLs and HTTP
igraph                  Network Analysis and Visualization
influenceR              Software Tools to Quantify Structural
                        Importance of Nodes in a Network
irlba                   Fast Truncated Singular Value Decomposition and
                        Principal Components Analysis for Large Dense
                        and Sparse Matrices
iterators               Provides Iterator Construct for R
jsonlite                A Robust, High Performance JSON Parser and
                        Generator for R
knitr                   A General-Purpose Package for Dynamic Report
                        Generation in R
labeling                Axis Labeling
lambda.r                Modeling Data with Functional Programming
lattice                 Trellis Graphics for R
lazyeval                Lazy (Non-Standard) Evaluation
littler                 R at the Command-Line via 'r'
magrittr                A Forward-Pipe Operator for R
markdown                'Markdown' Rendering for R
MASS                    Support Functions and Datasets for Venables and
                        Ripley's MASS
Matrix                  Sparse and Dense Matrix Classes and Methods
memoise                 Memoisation of Functions
mime                    Map Filenames to MIME Types
munsell                 Utilities for Using Munsell Colours
NMF                     Algorithms and Framework for Nonnegative Matrix
                        Factorization (NMF)
odbc                    Connect to ODBC Compatible Databases (using the
                        DBI Interface)
openssl                 Toolkit for Encryption, Signatures and
                        Certificates Based on OpenSSL
openxlsx                Read, Write and Edit XLSX Files
pkgconfig               Private Configuration for 'R' Packages
pkgmaker                Package development utilities
plogr                   The 'plog' C++ Logging Library
plyr                    Tools for Splitting, Applying and Combining
                        Data
praise                  Praise Users
purrr                   Functional Programming Tools
R6                      Classes with Reference Semantics
RColorBrewer            ColorBrewer Palettes
Rcpp                    Seamless R and C++ Integration
registry                Infrastructure for R Package Registries
reshape2                Flexibly Reshape Data: A Reboot of the Reshape
                        Package
rgexf                   Build, Import and Export GEXF Graph Files
rJava                   Low-Level R to Java Interface
RJDBC                   Provides access to databases through the JDBC
                        interface
rlang                   Functions for Base Types and Core R and
                        'Tidyverse' Features
rmarkdown               Dynamic Documents for R
rngtools                Utility functions for working with Random
                        Number Generators
Rook                    Rook - a web server interface for R
roxygen2                In-Line Documentation for R
rprojroot               Finding Files in Project Subdirectories
rstudioapi              Safely Access the RStudio API
scales                  Scale Functions for Visualization
slackr                  Send Messages, Images, R Objects and Files to
                        'Slack' Channels/Users
stringi                 Character String Processing Facilities
stringr                 Simple, Consistent Wrappers for Common String
                        Operations
testthat                Unit Testing for R
tibble                  Simple Data Frames
tidyr                   Easily Tidy Data with 'spread()' and 'gather()'
                        Functions
viridis                 Default Color Maps from 'matplotlib'
viridisLite             Default Color Maps from 'matplotlib' (Lite
                        Version)
visNetwork              Network Visualization using 'vis.js' Library
whisker                 {{mustache}} for R, logicless templating
withr                   Run Code 'With' Temporarily Modified Global
                        State
XML                     Tools for Parsing and Generating XML Within R
                        and S-Plus
xml2                    Parse XML
xtable                  Export Tables to LaTeX or HTML
yaml                    Methods to Convert R Data to YAML and Back

Packages in library ‘/usr/local/lib/R/library’:

base                    The R Base Package
compiler                The R Compiler Package
datasets                The R Datasets Package
graphics                The R Graphics Package
grDevices               The R Graphics Devices and Support for Colours
                        and Fonts
grid                    The Grid Graphics Package
methods                 Formal Methods and Classes
parallel                Support for Parallel computation in R
splines                 Regression Spline Functions and Classes
stats                   The R Stats Package
stats4                  Statistical Functions using S4 Classes
tcltk                   Tcl/Tk Interface
tools                   Tools for Package Development
utils                   The R Utils Package

This output is identical after I run my package's build except my package is listed as well in library.

Ah, the reason that tests are not found is that they are not installed by default by R CMD INSTALL, you need

R CMD INSTALL --install-tests

To install with tests.

As for why install() is re-installing packages which are already installed, the only thing I can think of is the user you are running the script under does not have permissions into the site library directory where the packages are installed.

Using --install-tests allowed me to remove install() from my test run script and cut down time significantly.

Thanks!

Is it best practice to use a Rscript as above when creating a docker container, and not simply something like below? I am currently trying to create a container, where it seems like when compiling, I tend to get much more errors than when I am simply doing a install.packages from my local R Session.


FROM rocker/tidyverse:latest

MAINTAINER Pete

RUN apt-get update -qq \
  && apt-get -y --no-install-recommends install \
    libicu-dev \
    libbz2-dev \
    liblzma-dev \
    default-jdk \
    default-jre \
  && R CMD javareconf \
  && install2.r --error \
    --repos 'http://cran.rstudio.com' \
    h2o \
    tidytext \
    janitor \

Does anyone have any best practices for creating dockerfiles in R?

I’ve done it both ways in the past. I couldn’t convince myself there was a good reason not to use Rscript. I’d guess the preference is largely aesthetic. I don’t love depending on how things work upstream that might change and in that sense, things like littler felt like a dependency I could walk away from without losing much.

Not entirely sure I've followed things here, but I generally go for putting the dependencies in a DESCRIPTION file and just calling devtools::install(); rather than listing them in the Dockerfile.

It may be too soon to declare what are "Best Practices", but I think doing things that are more platform agnostic (dependencies the standard R way in DESCRIPTION rather than Docker-specific way in Dockerfile -- same goes for travis CI etc) is "better practice" since it is more standard/portable.

As long as there are no additional apt-dependencies (which I avoid by usually relying on a heavier base image like verse or geospatial), this means things can run without much of a custom Dockerfile.