Base R vs. Microsoft R Open

backend
memory
performance

#1

I have never been all that impressed with Microsoft products however, Microsoft R Open (MRO) seems like an interesting idea. I have not had a chance to try it out yet but I have read bits and pieces about MRO but I was wondering what this community thinks about it? It seems like most of what is written is propaganda from Microsoft itself so I am really interested in peoples experiences? Have people used it? Pros or Cons? Compatibility with RStudio?


#2

I’ve used it off and on for almost 4 years now (before Microsoft acquired Revolution Analytics and it was called Revo R Open). There are things I liked and disliked about it.

Pros:

  • easy to install
  • easy to install Intel Math Kernel libraries to take advantage of multiple core CPUs
  • matrix computation benchmarks do show performance improvements over base R
  • MRO uses MRAN to install snapshots of packages from CRAN which allows for better reproducibility (in theory)

Cons:

  • sometimes you can’t upgrade or install packages because they have depends on R #.#.# and you’re MRO is still behind the latest. this would not be as difficult if package maintainers were careful about what R version they put in the Depends: section of their DESCRIPTION file instead of just relying on RStudio or devtools to build the package (which auto sets to the version of R it was last built on).
  • upgrading between R versions is harder than just running a package management tool upgrade or update command
  • the reproducibility aspect only works if you fully buy in and use the checkpoint feature to lock package versions… And so does everyone else on your team.

In my opinion, if you are doing mostly solo analysis on Mac or Windows, I would recommend giving MRO a try as the benefits are pretty good.

If you are a sysadmin (or data scientist, but often get to wear that hat as well like me), and you can fully control the development server/cluster that your colleagues will be working on, then it’s very possibly a worthwhile experiment. It’s really nice to require people to be responsible for using a specific version of a package instead of just knowing it by name (which MRO does automatically with checkpoint) and know that your setup will more efficiently use the full computing power available without necessarily needing to have analysts change their R code. You just have to be careful to communicate when doing upgrades, updates, installs, etc.

However, if most of your development is done on Linux and/or you want to typically deploy R code to Linux servers or docker containers, I would not recommend MRO. That extra effort of setup and maintenance that makes it harder to try and deal with when the Linux package mangers like apt and yum can deal with that stuff much more easily. Also, openblas is super easy to install on Linux (eg apt install openblas) and gets you 90% the performance boost of the Intel MKL most of the time.


#3

Thanks for your thoughts! I am surprised no one else has an opinion to share? Surely others have experimented with MRO?

I have not yet tried it but your comment @raybuhr suggest I should.


#4

I’m not sure I have a whole lot to add, but: Like @raybuhr, I’ve used Microsoft Open R off and on for a couple years, and I’d say my experience is largely the same. The speed increase is nice, and the reproducibility aspect is neat (though I believe you can packrat to get the same behavior), but every time I use MRO it feels like eventually run into errors updating packges, or sharing work between my Linux and Windows machines.


#5

I would concur with @ben-e. There are some situations in which it is beneficial, but the random errors always push me back towards the standard CRAN implementation of R. Since getting into package development, I’ve noticed it takes work to maintain support for MRO. I imagine many other developers find it difficult supporting Linux, Mac and Windows for their CRAN package. MRO doubles that workload (MRO on Linux, Mac and Windows) and not everyone has time to provide the extra support.


#6

Important to note that checkpoint and MRO are not explicitly linked, it’s just that checkpoint is installed and enabled by default with MRO.

We actually use checkpoint with regular R and it works pretty well for operational processes where we want to lock down the set of packages being used once we release the scripts. It’s easy enough to have separate processes using a different set of packages, so we don’t have to upgrade everything at once or anything like that. I tend to use latest packages from CRAN while developing and then as I get near to release switch to MRAN and lock down the date so I don’t hit any gotchas with new package releases.


#7

Nice! I actually didn’t know that was an option. We use packrat on my team for production R code, but it’s not without warts. I’m going to try checkpoint on the Ubuntu apt r-base install and see how it goes.


#8

@abram I have a Windows 10 laptop at work and I can’t say anything about MRO on other SO, but on Windows I’ve been using MRO uninterruptedly for years now (heck, the Microsoft guys should give me a :name_badge: :smile:).

I find MRO is great if you do matrix-intensive work (for example, Gaussian Process Regression or Deep Neural Networks) and in general Bayesian inference for non-conjugate models: QR decomposition, Cholesky, etc., are all matrix operations which, for some Bayesian models, you need to repeat at each step of the MCMC chain, and in that case you will see a substantial speedup. This is mostly due to the Intel MKL libraries, which currently are installed by default with MRO (in older versions, you would have to install them separately, which frankly didn’t make a lot of sense, because installing MRO without Intel MKL is kinda missing the whole point).

  • Multi-core support: well, I have a laptop now, so I don’t have a lot of cores, but I used to have a Xeon workstation with a lot of cores before, and I noticed a big difference between some packages and others. Even among those which weren’t linear algebra-intensive, some would get a large speedup, which would depend on the number of cores I dedicated to MRO (by defaults, it uses all your cores). Others wouldn’t get a visible speedup (maybe there was some, but I didn’t use microbenchmark to know for sure). I guess the R code must be written in a way that makes use of available cores.

  • Package installation: if you don’t like the fact that the MRAN mirror doesn’t update as regularly as the CRAN mirror, all you need to do is chooseCRANmirror() and voilà! you get to use the CRAN mirror of your choice (there are other options, I just showed the easiest/less-tamper-with-you-config-files one). This will only fail when MRO is updated to version 3.4.1, say, and the package update requires 3.4.2. However, this happens only for major updates, and 3.4.1 and 3.4.2 were minor updates (as opposed to 3.4.0, for example)

  • RStudio compatibility: MRO is touted as 100% compatible with CRAN R, so everything CRAN R compatible with, should be compatible with MRO too. Personally, I’ve never had any issues with RStudio until now.

Disclaimer: I’m not affiliated with Microsoft in any way.


#9

POSITIVE POINTS OF MRO
Microsoft R Open uses multi-threaded Math libraries (Intel MKL), which imply considerable speed gains when doing complex matrix operations.

NEGATIVE POINTS OF MRO
MRAN (Fixed CRAN Daily Snapshot) is used instead of CRAN, which can be good in terms of reproducibility, but prevents being always updated with the latest package versions.

However, you can override such behaviour in your .Rprofile, by setting the following configuration option:

options(repos = c(CRAN = “https://cloud.r-project.org/”))


#10

Performance comparison: Base R vs R Open

The Benefits of Multithreaded Performance with Microsoft R Open
https://mran.microsoft.com/documents/rro/multithread



#11

Just to chime in with my (less than) two cents: I really recommend MRO with MKL if you're on Windows, just for the speed improvements. However, as a means to retain your sanity, I would strongly suggest then immediatly changing the default CRAN mirror from MRAN to something else.