One of the most common questions we’ve gotten while talking to admins about R packages is whether or not RStudio Package Manager includes a virus scanner.
The answer is no.
R is executed by the R interpreter at runtime, R programs are not pre-compiled. This feature of R is one of the reasons it is such a popular tool for interactive data science. However, because R programs are not compiled into executables, many techniques employed by virus scanners are not very effective against R packages. (R packages can contain source or compiled external code, such as C++ code, which marks the main difference between a “source package” and a “binary package”).
So how can we trust R packages?
This is a fascinating question that deserves a detailed response. Many in the R community are actively working on this challenging question, just as people in other open source ecosystems tackle these challenges.
While not extensive, I offer these 5 considerations for users or admins wondering about package security:
RStudio provides R packages to RStudio Package Manager through an upstream RStudio service designed specifically for this task. The connection between this service and RStudio Package Manager is encrypted. Daily updates to CRAN are reviewed by our team before they are made available through this service. The review process checks for consistent package metadata and also updates the package checksum file, used by the R client to ensure downloaded package files are correct. We highly recommend that the connection between your R clients and RStudio Package Manager be encrypted by hosting your RStudio Package Manager instance over HTTPS.
CRAN requires all submitted R packages to pass a series of checks prior to accepting them into the CRAN repository. These checks include installing the package alongside other CRAN packages and running package unit tests. While these tests do not specifically target malicious code, the tests provide a significant hurdle to uploading malicious packages to CRAN.
R code is almost always executed as a non-privileged user. The majority of R code, especially code run in RStudio Server Pro or RStudio Connect, is executed on behalf of a restricted service or user accounts. RStudio Server Pro, for example, runs under an AppArmor profile that is inherited by the R processes it invokes on behalf of non-privileged users. Similarly, RStudio Connect provides an extensive sandboxing process to run user code in an isolated environment. Additionally, while RStudio Package Manager provides a means for users to download packages originating on the internet, most R code is executed in offline environments, often dedicated analytic sandboxes. These measures not only prevent malicious code, but also keep analysts from accidentally interfering with one another. Learn more about RStudio’s security policy or common security FAQs.
RStudio Package Manager allows you to control exactly what packages are brought into your organization through curated sources.
Our security team is working on an Incident Response Plan that will document what actions would be taken in the event that a malicious package were to be discovered on CRAN. Because RStudio Package Manager provides a central entrypoint for R packages into your organization, it is easy to for admins to audit their risk exposure.
We’re excited to hear your feedback and any suggestions.