How do I configure R so programs developed in RStudio run successfully in R?

rstudio
dependencies

#1

RStudio has more resources than R. How do I configure R so programs developed in RStudio run successfully in R?

I lead the data analysis group in a public health dept that is shifting toward R from SAS. We want to be able to develop programs in RStudio, and then run them from a command line (i.e., in batch mode in R).

I just found that pandoc is available in RStudio but not R. I am guessing that we may find other differences in the resources available or the default configurations, as we get deeper into this.

Is there some guide or trick to setting up R so that it will run any R program that runs successfully in RStudio? I am hoping there might be something we can put in the command line that runs an R script (i.e., with the Rscript command), like maybe a pointer to some config file that RStudio uses.

Ideally, the solution would be a process we can use each time we install new versions of R and RStudio, rather than a fix that only lasts until the next version of RStudio.

NOTE:

  • I know that R can run anything that I run in RStudio, if only I add the right additional tools to R (like pandoc). I know that R can run pandoc, after I install pandoc somewhere that R can find it.
  • We run all of this in Windows, both on our laptops and on the server. That is what our IT dept. supports.
  • I have a group of SAS users who I am trying to transition to R. I need to:
  1. minimize the frustration and extra work that comes form the "surprises" they get when a script ran fine in RStudio does not run when they call it with R from a command line.
  2. minimize the times we have to install new things (like pandoc) on everyone's computer and on the server where R runs. To do that, we need to involve our corporation's IT group, who will run out of patience quickly if we pepper them with change requests at random intervals.

Reflections on RStudio transparency
#2

Pandoc is not available for R? Pandoc can be executed from the command line, so I'm sure you can use it with R and not just in RStudio. Perhaps you're thinking of the 'knit' button in the IDE? (It's just running rmarkdown::render().)
To get back to your main question though, the vast majority of things you do with code in RStudio, you do with R. The only function I can think of right now that is just for RStudio is View()...

Anyway, nothing beats testing your programs on the command line the old fashioned way if you're unsure -- lots of print(class()) etc.!


#3

Pandoc definitely works with R! RStudio just comes packed-in with a bunch of stuff (like pandoc).

Pandoc is an external dependency for RMarkdown (which, as @RobertMyles points out, is what is called when you hit the Knit button in RStudio). RStudio bundles the rmarkdown package and pandoc in when you install it; if you want to install R but not RStudio, you'll need to also install pandoc and rmarkdown yourself.

The version of R that comes with RStudio is totally vanilla, as far as I'm aware, and it can run independently. So you can just install RStudio but then call R from the command line, and all of those dependencies like pandoc should be fine. (This is what I do!)

EDIT: for example, here's a screenshot of RStudio running a script on my computer:

And here's me running that script directly from the command line (note, above, that my script starts with a shebang so that I can run it with ./test.r, not RScript test.r):


Reflections on RStudio transparency
#4

Yes, RStudio comes with a version of pandoc but not R. Pandoc is needed by rmarkdown. However, as for other :package: with R that have some Systeme Requirements, you can install pandoc yourself and everything will work find. (as @rensa explained)

RStudio is an IDE that ships with a lot of feature, some interactive, that helps user do analysis and develop some programs. Everything is run by R at the end. Moreover, apart from IDE option, the options for R and the R specific environment variable are configure into R not RStudio (.Rprofile and .Renviron files).

About the command line, to complete @rensa example, you'll find a very interesting series of post here

Hope it helps.


#5

Thanks for your reply. Command line utilities is precisely the direction we are headed.

I do understand that scripts run in RStudio are actually running in R, in the end. As I understand it, RStudio is essentially running the R package I already have installed, but with what amounts to an altered configuration (e.g., additional or different folders in the PATH). Ideally, what I'd like is something analogous to dumping the implicit RStudio configuration into a file that I can then call when I run R, so that R runs under the same configuration.


#6

Yes RStudio runs script with R using R :package: configured in the R session - it is not a RStudio option. It is difficult to know what RStudio brings that would be so different that you can't run a script outside of RStudio. I never encounter this upto now. Most (if not all) the configuration concerning R sessions, are from R configuration files not RStudio.

Are you trying to anticipate or did you encountered some issue already ?


#7

I've encountered the issue already, with the absence of pandoc in the default R installation, while it is in the RStudio installation. I submitted a script that included outputting a plotly interactive graph with htmlwidgets::saveWidget(...) . That failed for lack of access to pandoc. I know how to solve that specific problem, but I'm hoping to find a process for avoiding similar issues in the future.
I guess you could say that I'm trying to get my dev (RStudio) and prod (command line Rscript) environments to match.


#8

I have never encounter this kind of configuration comparison yet...
When I want to reproduce the dev environment (RStudio) in prod environment, I use packrat to isolate :package: dependencies and R version. Then, I installed all these :package: in the new environment with their System Requirement. As pandoc is a requirement for rmarkdown :package:, it is installed.

I am interested in this kind of list of things to be aware of if you make one.


#9

Thanks again for your replies. I initially thought that when my script successfully ran library(plotly) (after installing it, if necessary), it meant that whatever plotly might need (like pandoc) was available. I saw that plotly imports htmlwidgets, which is what uses pandoc.

Your mention of packrat and dependencies has me wondering: Did my packagename::functionname() syntax do some sort of end-run around the full htmlwidgets installation? Could I avoid this "runs in RStudio but not R" problem by avoiding that packagename::functionname() syntax?

You mention installing packages with their System Requirement. If I install a package (e.g., htmlwidgets), are the system requirements automatically installed, or do I need to do that separately? Does something tell me what a package's system requirements are? Maybe that's a path I can travel toward a solution - I could define the set of packages my group uses, and make sure all the server and laptops have all of their system requirements.

Still, it seems a shortcoming of RStudio to include, by default, resources that are not in the default R installation yet might be used in R scripts, and not document those differences. If a dev tool is so user-friendly that it lets things run that won't run in prod, it should list the differences someplace obvious.


#10

When you look up a package that's on CRAN, it should have a field called SystemRequirements that describes external dependencies. For example, the CRAN listing for rmarkdown includes:

SystemRequirements: pandoc (>= 1.12.3) - http://pandoc.

But this field isn't checked or acted on in any way, so normally it's up to you to ensure those requirements are met.

I know some packages, like tinytex, provide additional installation functions to install the external stuff that can't be handled by the usual package installation function. Others just look for the external dependency in places where it would expect to find them on a system, like $PATH on *nix systems, and they let you know if it isn't there.

I'm not sure of a way to automate this, unfortunately. The reason RStudio can include additional external dependencies is because it has its own installer; I have to assume the R package installation process is intentionally limited for security reasons. I found one (albeit WIP) attempt to automate it, but I imagine it's a massive task to handle any conceivable external dependency. Much easier when you know which ones you're including, as RStudio does :confused:

I think the best you can do here is check the CRAN listings (or GitHub READMEs) of the packages you need in production and ensure that you can manually install any external dependencies beforehand (if the package doesn't provide a way to do it for you).


#11

@RobertMyles

View() is not just for RStudio and it totally predates the IDE. The only thing that function needs is a window system (present by default on Windows, macOS, and on linux as long as you have a desktop environment (e.g. X11)).


#12

You just blew my mind.

(it is ugly af on X11 tho :laughing:)

EDIT: holy moly, and there's data.entry too!


#13

It really is lol. I am using DT to have a quick peek at my data (I am not an RStudio user) because it looks too ugly. But eh! it works :slight_smile: And you can even scroll up/down, left/right with Pgup/Pgdown and Home/End :smile:


#14

And of course, you need to load rmarkdown (which I believe is not necessary if you just click Knit in RStudio)


#15

Ah ah ah :laughing: Who needs Excel/Libreoffice anymore, between the old school data.entry and the new school tribble?


#16

I regularly develop in RStudio then run the scripts in R command line, perhaps you are running into an issue where the R packages are being installed in different locations? That would be related to your .libPaths() setting - run that in RStudio and R command and check they return the same locations.

If running on the same computer, all instances of R should point at the same default R package location, which would include the pandoc package if its been installed by RStudio. What sometimes happens is that you instead install R packages locally to the user, so when a global R session tries to run the same script, the packages are not available to it. Perhaps your .RProfile or .Renviron are set to save packages to a local user's folder, then when R base is run it is not picking up the same environment arguments. See ?Startup to see how R base reads setup configuration.

But all in all, AFAIK RStudio installs nothing that shouldn't run in R command, it just has extra UI tools such as Shiny gadgets and buttons in the UI which are easier to view in RStudio, but underneath are shortcuts to executing R code.


#17

Good to know! I read somewhere before that it was RStudio only, happy to learn that it's not the case :slight_smile:


#18

Supposedly RStudio replaces View() with their own View() (see https://stackoverflow.com/questions/48234850/how-to-use-r-studio-view-function-programatically-in-a-package), which I also didn't know...


#19

So I guess we were both right, depending on the "point of View" :wink:


#20

A great answer was posted on another thread by @nutterb that seems relevant here: