I am aware of BLIS and libflame but have not extensively tested it myself so far.
While some of the performance results from BLIS/libflame shown on the various gh pages look rather impressive, I also need to state that having a performant BLAS/LAPACK implementation is helpful but will only help with speeding up codes if the majority of time is spent in BLAS/LAPACK routines and the data size is sufficiently large.
Benchmarks like https://mac.r-project.org/benchmarks/R-benchmark-25.R show speed-ups of 10x and more when used with OpenBLAS and Intel MKL but in real-world code the speed-up on average I have seen is more in the 20-30 % range (if at all). Also, OpenBLAS is packaged in most if not all Linux distributions today and hence makes it more easy to integrate it with R. A possible alternative with regards to ease of integration would be to look into flexiblas
In my very own tests of comparing Intel MKL, OpenBLAS and vanilla LAPACK/BLAS the mentioned R benchmark execution time was 5.2 s / 6.6 s / 38.4 s, respectively.
While in the past I have been a strong proponent of Intel MKL and have pushed the limits of the Intel toolkit (Intel Compilers + MKL) to the limits, I have come to the conclusion that GNU Compilers + OpenBLAS for most workloads is close enough to Intel MKL performance so that the extra overhead and potential troubles with reproducibility (MKL_CBWR
) and stability is just not worth it. With R being an open source product it also does not feel right to combine it with a yet free but closed source product such as Intel MKL.
But don't get me wrong - if you and your colleagues have codes that call R functions that make efficient use of BLAS/LAPACK (i.e.push enough data into those BLAS/LAPACK functions) and also spend the majority of time during code execution in BLAS/LAPACK, you really should continue to optimise for BLAS/LAPACK performance.
While I am far from trying to quench your desire for optimising performance of R via BLAS/LAPACK - having the R developers write efficient code still goes a long way compared to tuning the R installation.