I would like to compile R with OpenBLAS and its integrated LAPACK implementation. However, when I configure R, it always complains that it cant' find dgemm_ and dpstr_f:
checking for dgemm_ in -lopenblas -I/opt/OpenBLAS-0.3.23/include -L/opt/OpenBLAS-0.3.23/lib... yes
checking whether double complex BLAS can be used... yes
checking whether the BLAS is complete... yes
checking for dgemm_ in -llapack... no
checking for dpstrf_ in -llapack... no
checking if LAPACK version >= 3.10.0... no
configure: using internal LAPACK sources
I can find these two functions in libopenblas.a by typing nm /opt/OpenBLAS-0.3.23/lib/libopenblas.a | grep "dpstrf_", the output is the following:
0000000000000000 T dpstrf_
U LAPACKE_dpstrf_work
lapacke_dpstrf_work.o:
U dpstrf_
0000000000000000 T LAPACKE_dpstrf_work
...
checking for dgemm_ in -L/usr/lib/x86_64-linux-gnu/ -lopenblas... yes
checking whether double complex BLAS can be used... yes
checking whether the BLAS is complete... yes
checking for dpstrf_... yes
...
I agree with that - R figures out what to do to best link against OpenBLAS - I ran a few tests comparing various compilation methods but the two approaches in this ticket (despite the different configure messages lead to the same compute performance.
I am aware of BLIS and libflame but have not extensively tested it myself so far.
While some of the performance results from BLIS/libflame shown on the various gh pages look rather impressive, I also need to state that having a performant BLAS/LAPACK implementation is helpful but will only help with speeding up codes if the majority of time is spent in BLAS/LAPACK routines and the data size is sufficiently large.
Benchmarks like https://mac.r-project.org/benchmarks/R-benchmark-25.R show speed-ups of 10x and more when used with OpenBLAS and Intel MKL but in real-world code the speed-up on average I have seen is more in the 20-30 % range (if at all). Also, OpenBLAS is packaged in most if not all Linux distributions today and hence makes it more easy to integrate it with R. A possible alternative with regards to ease of integration would be to look into flexiblas
While in the past I have been a strong proponent of Intel MKL and have pushed the limits of the Intel toolkit (Intel Compilers + MKL) to the limits, I have come to the conclusion that GNU Compilers + OpenBLAS for most workloads is close enough to Intel MKL performance so that the extra overhead and potential troubles with reproducibility (MKL_CBWR) and stability is just not worth it. With R being an open source product it also does not feel right to combine it with a yet free but closed source product such as Intel MKL.
But don't get me wrong - if you and your colleagues have codes that call R functions that make efficient use of BLAS/LAPACK (i.e.push enough data into those BLAS/LAPACK functions) and also spend the majority of time during code execution in BLAS/LAPACK, you really should continue to optimise for BLAS/LAPACK performance.
While I am far from trying to quench your desire for optimising performance of R via BLAS/LAPACK - having the R developers write efficient code still goes a long way compared to tuning the R installation.