I was waiting for R 4.1. and native Apple silicon support to dome some benchmarks against other platforms. The results on my MacBook Pro with the M1 chip look disturbing to me. Let's start with the Mac:
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Big Sur 11.4
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.1.0 tools_4.1.0
The results from the benchmark are:
> N <- 20000
> M <- 2000
> X <- matrix(rnorm(N*M),N)
> system.time(crossprod(X))
user system elapsed
49.954 0.109 50.056
Interestingly, the sessionInfo has different output in R Console:
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Big Sur 11.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.1.0
It seems to me that R uses Apple Acclerate framework's BLAS libraries, but the benchmarks are similar:
> system.time(crossprod(X))
user system elapsed
49.909 0.117 50.015
Under Windows using my Thinkpad E 580 it is a whole different story:
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] microbenchmark_1.4-7 RevoUtils_11.0.2 RevoUtilsMath_11.0.0
loaded via a namespace (and not attached):
[1] compiler_4.0.2 tools_4.0.2 grid_4.0.2 lattice_0.20-41
The computations are much quicker:
> system.time(crossprod(X))
user system elapsed
2.60 0.03 0.70
Windows uses Microsoft R Open and that may explain the difference. On Ubuntu or Fedora, using OpenBlas on theThinkpad, the results are similar to Windows. I don't know if this is to be expected. I'm worried that the macOS R is inexplicably slow.