Assessing R performance with optimized BLAS across three operating systems
Increasing R performance using an optimized BLAS
To efficiently execute many of its low-level computations, R utilizes a BLAS (Basic Linear Algebra Subprograms/Subroutines). The default BLAS that comes with R is optimized for stability, but not for speed. Using an alternate BLAS can result in improved speed in several basic operations that underlie many of the higher-level statistics and data science operations that are common in academic and industry research. This is done in part through better use – or any use – of the multiple processor cores present in most modern computers.
My aim with this blog post was to document the performance of R using the standard vs. an optimized BLAS in three different operating systems.
Attempting to replicate the following results should be done cautiously, with the caveat that alternative BLAS reportedly have the potential to impact the stability or precision of the calculations. I have not been able to find information about how reliably this happens or how impactful it can be. Of course, if you are doing anything more than tinkering, it may be best to avoid experimental BLAS implementations unless you know what you’re doing.
I utilized a standard benchmark script to assess the performance of optimized vs. default BLAS across three operating systems: Windows 10 (21H1), Mac OS (11.1), and Ubuntu (21.04). It is possible to utilize an optimized BLAS on most operating systems (OS), although it is more difficult on some OSs than others. The benchmark script was a slightly modified version of R benchmark 2.5 found at this R-project webpage.
R 4.1 was used for each test except for MRO in Windows, described below.
Installing the alternate BLAS was easiest on Windows, because Microsoft maintains a version of R (Microsoft R Open) that includes an optimized BLAS from Intel and some other tweaks that allow it to run stably (and presumably precisely) out-of-box with no extra steps, and through a standard install process. Next easiest was Mac, because Apple ships MacOS with an optimized BLAS implementation (using Apple’s vecLib) that can replace the default BLAS with a few lines entered into the Terminal. Ubuntu was slightly more difficult to implement, if only because I’m still less familiar with the Linux command line, and there were several steps. But it was also more flexible once installed, because it is easier to move back and forth between the default BLAS and openBLAS using a simple Terminal command. With Mac, one must move, or at least rename, a few files, and R does not detect that an alternate BLAS is being used (at least the way I implemented it), because the changes made to the R directory lead R to see all the same file names and library links as before.
Windows (using Microsoft R Open)
I installed Microsoft R Open (MRO) and then opened Rstudio using MRO instead of the default R install. MRO uses the RevoScaleR library for optimization alongside the optimized BLAS from Intel’s Math Kernel Library. At the time of testing, MRO was an optimized version of R 4.0.2. Unfortunately this will be the last version of MRO, but Microsoft will be open-sourcing the RevoScaleR package that enabled some of the speed improvements; it is unclear how much the open-source community will maintain the combination of packages and Intel MKL implementation that makes MRO so easy to use.
MacOS (using Apple’s vecLib library)
Getting vecLib working with R before Mac OS 11 Big Sur was fairly straightforward; it took just a couple terminal commands. This is detailed at the top of this blog post.
But for my use case, in order to get vecLib working in place of the default R BLAS in Mac OS Big Sur (or later), it took downloading a file and linking it to R, as described here by Simon Urbanek. As he warns, this is an untested setup which may result in less precision, less stability when running with already-parallel R functions, and potential loss in performance. Again, these implementations are to be used cautiously. Note that running
sessionInfo() shows that R doesn’t detect a different BLAS is being used. Pulling from Urbanek’s reply, the following code worked for me to use vecLib with R 4.1 in MacOS Big Sur:
curl -O https://mac.r-project.org/libs-4/libRblas-vecLib-signed.tar.gz tar fxzP libRblas-vecLib-signed.tar.gz -C / cd /Library/Frameworks/R.framework/Resources/lib mv libRblas.dylib libRblas.0.dylib ln -s libRblas.vecLib.dylib libRblas.dylib
Linux (Ubuntu 21.04; using openBLAS)
Linux is easily the most flexible OS for using an alternate BLAS: a few commands entered into the Terminal allow quick switching of the default BLAS for other BLAS implementations. I followed the guide here.
Summary of results
Breaking it all down, we have the overall mean of the trimmed means across the 3 operation categories for each OS, as well as the sum total time to run all tests.
Using an optimized BLAS increased performance in several areas, regardless of OS. Whether this translates into improvements in real-world performance is less clear; code is slow for many more reasons than non-optimized linear algebra routines. Furthermore, the operations in this benchmark may not apply to the variety of complex functions utilized in statistics and data science.
Next steps here are to test real-world scripts covering a variety of problems in statistics and data science that should be affected by faster linear algebra functions and see if they actually are affected. And, as mentioned before, even though performance may increase, precision may suffer. This is worth testing, because costs associated with that tradeoff are important and may vary depending on a user/organization’s context and goals.