Allow install.packages but not install_github

rstudio
rstudioserverpro
install_github

#1

I am setting up a new instance of RStudio Server Pro in a secure environment. I would like users to be able to install any packages from CRAN but not be able to install anything else (e.g. packages on Github).

Does this seem reasonable from a security standpoint?

What is the best way to implement this?


#2

Honestly, I think the best way to do this is to remove the RStudio Server Pro's access to the internet. It is definitely reasonable from a security standpoint - many organizations want to restrict what open source code is being executed on their servers.

One note - installing a package is basically just executing a command on a tar ball. In this sense, users can basically install whatever they want, as long as they have access to the server. However, taking the server offline makes this more painful and therefore less likely. Plus, these are not malicious actors, so you would hope they will honor the policy. If you need to monitor these sorts of things, there is code auditing you can turn on in RStudio Server Pro.

The goal, then, is to lead your developers with a "carrot" (and not with a stick) by making certain things easy. For instance, if you allow access to CRAN through a HTTP_Proxy (but that is all they can access), then it will be very appealing to users to easily install the thousands of packages available there. Hence, less appealing to go through the more cumbersome work of getting a tar ball onto the machine for some other random package.

You might also think about setting up an in-house CRAN mirror. This has the advantage of allowing you to serve select Github packages (maybe you define an approval process that users can go through to request packages) or internal packages developed for only your team's use. Your customer success rep would also be happy to help talk you through this process. It is definitely a common one - I think you're on the right track!


#3

Is there a way to prevent certain commands from running on the server? I'm wondering if I could somehow just prevent the devtools::install_github function from running. Maybe overwrite the function in the default user profile?

An in-house CRAN mirror would be the best solution (or RStudio's package manager). However we need a solution in the short term that would let admins install any packages but prevent or at least strongly discourage R programmers from installing packages from non-CRAN sources. I think the setup we are moving toward is to have the admin install all packages into a global shared library, one for each version of R on the server, and then discourage users from installing anything. The carrot would be that all packages are already installed.

Does that setup make sense to you and is there an easy way to prevent users from running install.packages or install_github without removing internet access for Rstudio server pro?


#4

I think you can make the life difficult for those who try like by replacing the function in user profile or deactivating install button in the IDE. However, it will always be possible for an advanced user to install with the right command.

On our setup, we have offline environment and we provide an internal Cran mirror. So install_github don't work directly but still technically possible manually.

What surprise me is that you want to prevent github installation but leave internet access. Often, you don't want a user to install something unknown found on the web, like a github repo. But preventing this while leaving access to the web sound is surprising to me. The best way to achieve this would be a firewall proxy setup where download from gtihub.com is blocked. That way the internet access still exists but install from github is not possible.

But there is always a solution! A user can download a source bundle from github from somewhere online then upload to rstudio and install locally from tar.gz. So a github package will be installed. Even in an offline environment it's possible. Just costly in time and manual step.

R comes with modules and you can prevent that just ease the installation or make it difficult. :sweat_smile:


#5

I agree with @cderv . As he suggested, there is very little you can do to prevent a user from installing something from github. Remember, this is programming, and the packages are open source, so at the very extreme a user could just copy all of the function text and never download anything.

The best approach is to figure out what you want. Security? Validation? And then figure out how you work with your data scientists to make that happen.

The most successful installations we see are typically a repository managed by system admins, with a path to validation or something like it for users to request packages that they want to have added to the repository.

It ends up being the least cumbersome to admins, the most enabling to users, and allows for a free flow of communication about what features/packages are missing as well as what pain points there are in the process. The "shadow installations" become much less appealing if there is some feasible path to validated installation. On the flip side, as a data scientist, a "shadow installation" is a no brainer if it will save me 3 days of work or 10% accuracy in my model to use a package that someone else has already written (but may not be on CRAN, for a variety of reasons).

Our hope is that RStudio Package Manager will increasingly fill this void and facilitate that conversation between data scientists, IT, and security.

One final note, since I did not make it explicit - admin management of installed libraries on the server is usually complicated, time consuming, and insufficient. Remember, there are 12,000+ packages on CRAN, 4+ minor versions of R in active usage, and multiple package versions for each package. The number of combinations that could be installed is enormous, and the likelihood that a developer may want some combination that you did not install is nonzero. I promise - you are much better off spending your time on a repository than on installing packages into a library on the server.

The R user is used to installing packages, anyway, so it's no problem for them to have to install some stuff. Feel free to pre-load some popular packages, but don't try to install all packages anyone will need, because you will likely miss one (and waste a bunch of time/storage in the process).


#6

I'm not sure what the ramifications of this would be, but you could add this to the startup Rprofile:

my_install_github <- function(...) {
  stop("install_github may not be called here.")
}

setHook(packageEvent("devtools", "onLoad"), 
        function(...) {
          assignInNamespace("install_github", ns = "devtools",  # or maybe install_remotes
                            value = my_install_github)
        })

As you've acknowledged, this is merely a reminder that the function can't be used.


#7

Thanks for the advice. We only have a few R programmers and two R versions at this point so manually managing packages seems do-able, but I take your point. As we grow this will quickly become impractical. I should work on setting up an internal cran mirror or look into purchasing RStudio's package manager.