Microsoft R Open Speed


#1

I was wondering if Microsoft R Open would speed up functions like filter, mutate, summarise, group_by, inner_join in dplyr package? Also, will MRO improve the speed of regression functions like lm, glm?

My daily work involves intensive use of these functions. If MRO can offer some speed advantage, I would like to have a try.


#2

I'm not sure about dplyr, but from what I know, MRO/RevolutionAnalytics provides you with speed up version of most regression functions.
One more thing is that they use different linear algebra module, so things like that will be almost always faster than default BLAS R uses (you can find more info here, for example, but there are many benchmarks that all show the same thing).
In the end, there is no harm in trying, so I would go for it :slight_smile:. Just make sure that you measure performance so that you know whether for your workflow you get any measurable improvement.


#3

I used to try to stay away from any Microsoft products, but since I used VS Code, I guess I should change as Microsoft has changed. I will have a try!


#4

If you have a Mac, good luck with getting MRO to work! Instead, if you have a Windows machine, it will run flawlessly. No idea about Linux.

Clarification: after installing all the prerequisites, MRO does install on OSX. However, I can't get a lot of packages to install: for example, I couldn't get R Markdown to install, which was a show-stopper for me. What's more, unlike on Windows, you can't have the same versions of CRAN R and MRO installed on your machine, which means that every time you try to install MRO, you need to uninstall CRAN R (reaaally user-friendly). If someone manages to install MRO 3.5.1 on OSX and still be able to install packages, I'd love to know how... Later I'll ask a related question.


#5

I installed it on my Fedora machine, everything works well for me, the issue is that I need to have both cran r and MRO to make it work. When I uninstall cran r, rstudio cannot link to MRO


#6

Good for you! Clearly installation on Fedora is easier than on OSX (Darwin). Can you install packages effortlessly? Please let us know if you find a significant speedup. I guess, if you do a lot of Gaussian Process Regression (lots of Cholesky decomposition) or if you do Deep Learning on the CPU (lots of linear algebra operations) you should see a difference, but if you mostly fit lm or glm models it shouldn't make a big difference...well, actually, since it's multithreaded, if you run code which makes use of multiple threads, then you could get a speedup even for non-linear algebra intensive operations.

Do I recall correctly that you were using Renjin some time ago? Surely it won't be as fast, however it's fully compatible with GNU R, unlike the current version of Renjin. Well, keep us posted, I'm curious about how it goes!


#7

It seems my adventurous search of R performance has be noticed.....


#8

Well, you know, it's a topic I'm also interested in, plus your avatar is easy to remember :wink:


#9

I think as a general rule, you're going to have the best experience with the least fussing by using
MRO and adding a line to your .Rprofile that sets your CRAN repository to RStudio's.

options(repos = c(CRAN = "https://cran.rstudio.com/"))

This makes sure you get the latest package updates versus the MRO default behavior, which installs the version of packages that were available on a specific date. This is good for some people, but not most.

If you want to test the speed difference, open up R and run

source("http://r.research.att.com/benchmarks/R-benchmark-25.R")

in each version you want to compare and check out the results.

With all of this said, there's another way to get a roughly equivalent speed boost on Mac. See this link for some command line instructions for linking macOS's built-in fast BLAS library to your R installation: https://statistics.berkeley.edu/computing/blas

I always make sure I have a fast BLAS with my install (MRO on Windows/Linux, the aforementioned method on macOS) because there are times where you'll notice the difference if you're run the same code without the faster BLAS. I noticed it the most on a series of regressions that required a big-ish set of matrix algebra calculations — something like 5000 observations and 15 predictors. There are some things where it either makes no difference or such a small one that you won't notice it, but it's nice to have.