R / RStudio needing restart after package installation


#1

I have a few questions related to the occasional need to restart RStudio after installing packages:

  • Is it R or RStudio that needs the restart?
  • What is it that creates this need?
  • Perhaps this will be answered by the above questions, but for R in production, say via Rserve, is this needed?

Thanks!


#2

I don't know exact answer to your first two questions (I would suspect that R needs to restart since RStudio is a visual tool, not a language, but that's just a guess), but can you elaborate on quoted point? Why would you need to install packages inside of RServe session?


#3

We're trying to figure out whether we can have users installing packages inside an Rserve session or whether we need another avenue for package installation. I don't mean to be cheeky, but why do you see it as possibly unnecessary to install packages in an Rserve session?


#4

I may be wrong about it, but I'm fairly certain it is not a good idea to do it. Every time you open new connection with RServe (which can be multiple times a minute) it'll start from the fresh session. That would mean that you would need to install package every time.

I don't know your architecture, but, again, from my vantage point RServe is mostly used in connection with Java, so not sure who your users will be and why they need to use RServe instead of, for example, Shiny.


#5

Usually, it is R that needs to restart for a package install (not RStudio). The reason is that the package has already been attached, and "detaching" the package cannot always be done cleanly. So restarting the session allows you to install the package into an environment where the package is not in use (This is not unusual in software. Upgrading / overwriting while things are running can often be cause for trouble).

As for upgrading with R in production, I echo @mishabalyasin's sentiment that it would be helpful to have more clarification about what you mean for "R in production" and why you are looking for a way to install packages within an RServe session?

I am not super familiar with RServe, but RServe is a "server" responding to client requests. I.e. you might think about a node.js server responding to clients. Usually, in "production," you would seek to reduce package installations and similar types of time consuming, potentially fault intolerant processes. This would definitely mean not installing packages within an RServe session.

Rather, you would usually be aiming for reproducible processes, and therefore have an environment set up and stable to serve RServe client requests. Another package you might think about for an API framework is plumber, which sets up a REST API.

These threads might also be interesting to peruse:


And the following article on reproducible package management (disclaimer: I'm the author):


#6

Thanks for the reply @cole. We need to be able to run R code as steps in an ETL process, and the engineering team thinks that Rserve would be a good way to do that. So we're talking about a handful of calls to Rserve per day. I'm out of my depth there, but when I told them that restarting R after package installs was sometimes needed, they wanted more info about when that's the case and how to reproduce it so they can test. From what you said, it sounds like as long as install.packages comes at the beginning of the session, there shouldn't be problems, but I'm not sure (genuinely) that matches with my experience. I think I have had instances of installing packages in a fresh session and not being able to attach them in the same session, but I'm not certain.


#7

From RServe website (emphasis mine):

Rserve is a TCP/IP server which allows other programs to use facilities of R (see www.r-project.org) from various languages without the need to initialize R or link against R library. Every connection has a separate workspace and working directory. Client-side implementations are available for popular languages such as C/C++, PHP and Java. Rserve supports remote connection, authentication and file transfer. Typical use is to integrate R backend for computation of statstical models, plots etc. in other applications.

So, I would say that install.packages shouldn't be used in those calls. You can, of course, prepare the machine with all the packages that need to be there and then link the library with, e.g., .libPaths().

This is the approach that we use currently and it works fairly well. Basic workflow is that you prepare packages that you need to use in those calls, put them in a zip folder and then put it on the machine that will actually host RServe server. Adding them inside of R is straightforward -- all you need to do is modify R_LIBS environment variable and include folder with the packages that you need for ETL. One tool that I've learned about fairly recently is called RSuite. I didn't use it myself but talking to one of the developers it sounded that it can help you with all of the steps I've described above.