R speeds up when the Basic Linear Algebra System (BLAS) it uses is well tuned. The reference BLAS that comes with R and Ubuntu isn't very fast. On my machine, it takes 9 minutes to run a well known R benchmarking script. If I use ATLAS, an optimized BLAS that can be easily installed, the same script takes 3.5 minutes. If I use OpenBLAS, yet another optimized BLAS that is equally easy to install, the same script takes 2 minutes. That's a pretty big improvement!

In this post, I'll show you how to install ATLAS and OpenBLAS, demonstrate how you can switch between them, and let you pick which you would like to use based on benchmark results. Before we get started, one quick shout out to Felix Riedel: thanks for encouraging me to look at OpenBLAS instead of ATLAS in your comment on my previous post.

**Update for Mac OS X users:** Zachary Meyer's comment gives bare bones details for how to accomplish a similar BLAS switch. He has a few more details on his blog. Thanks Zachary!

**Update for R multicore users:** According to this comment and this comment, OpenBLAS does not play well with one of R's other multicore schemes. It appears to be a bug, so perhaps it will get fixed in the future. See the comment stream for further details.

**Update for the adventurous:** According to Joe Herman: "OpenBLAS isn't faster than ATLAS, but it is much easier to install OpenBLAS via apt-get than it is to compile ATLAS and R manually from source." See Joe's comment for details on the benefits of compiling ATLAS and R from scratch.

### Installing additional BLAS libraries on Ubuntu

For Ubuntu, there are currently three different BLAS options that can be easily chosen: "libblas" the reference BLAS, "libatlas" the ATLAS BLAS, and "libopenblas" the OpenBLAS. Their package names are

$ apt-cache search libblas libblas-dev - Basic Linear Algebra Subroutines 3, static library libblas-doc - Basic Linear Algebra Subroutines 3, documentation libblas3gf - Basic Linear Algebra Reference implementations, shared library libatlas-base-dev - Automatically Tuned Linear Algebra Software, generic static libatlas3gf-base - Automatically Tuned Linear Algebra Software, generic shared libblas-test - Basic Linear Algebra Subroutines 3, testing programs libopenblas-base - Optimized BLAS (linear algebra) library based on GotoBLAS2 libopenblas-dev - Optimized BLAS (linear algebra) library based on GotoBLAS2

Since libblas already comes with Ubuntu, we only need to install the other two for our tests. (NOTE: In the following command, delete 'libatlas3gf-base' if you don't want to experiment with ATLAS.):

$ sudo apt-get install libopenblas-base libatlas3gf-base

### Switching between BLAS libraries

Now we can switch between the different BLAS options that are installed:

$ sudo update-alternatives --config libblas.so.3gf There are 3 choices for the alternative libblas.so.3gf (providing /usr/lib/libblas.so.3gf). Selection Path Priority Status ------------------------------------------------------------ * 0 /usr/lib/openblas-base/libopenblas.so.0 40 auto mode 1 /usr/lib/atlas-base/atlas/libblas.so.3gf 35 manual mode 2 /usr/lib/libblas/libblas.so.3gf 10 manual mode 3 /usr/lib/openblas-base/libopenblas.so.0 40 manual mode Press enter to keep the current choice[*], or type selection number:

**Side note:**If the above returned:

update-alternatives: error: no alternatives for libblas.so.3gf

Try

$ sudo update-alternatives --config libblas.so.3

instead. See the comments at the end of the post for further details.

From the selection menu, I picked 3, so it now shows that choice 3 (OpenBLAS) is selected:

$ sudo update-alternatives --config libblas.so.3gf There are 3 choices for the alternative libblas.so.3gf (providing /usr/lib/libblas.so.3gf). Selection Path Priority Status ------------------------------------------------------------ 0 /usr/lib/openblas-base/libopenblas.so.0 40 auto mode 1 /usr/lib/atlas-base/atlas/libblas.so.3gf 35 manual mode 2 /usr/lib/libblas/libblas.so.3gf 10 manual mode * 3 /usr/lib/openblas-base/libopenblas.so.0 40 manual mode

And we can pull the same trick to choose between LAPACK implementations. From the output we can see that OpenBLAS does not provide a new LAPACK implementation, but ATLAS does:

$ sudo update-alternatives --config liblapack.so.3gf There are 2 choices for the alternative liblapack.so.3gf (providing /usr/lib/liblapack.so.3gf). Selection Path Priority Status ------------------------------------------------------------ * 0 /usr/lib/atlas-base/atlas/liblapack.so.3gf 35 auto mode 1 /usr/lib/atlas-base/atlas/liblapack.so.3gf 35 manual mode 2 /usr/lib/lapack/liblapack.so.3gf 10 manual mode

So we will do nothing in this case, since OpenBLAS is supposed to use the reference implementation (which is already selected).

**Side note:**If the above returned:

update-alternatives: error: no alternatives for liblapack.so.3gf

Try

$ sudo update-alternatives –config liblapack.so.3

instead. See the comments at the end of the post for further details.

### Checking that R is using the right BLAS

Now we can check that everything is working by starting R in a new terminal:

$ R R version 3.0.1 (2013-05-16) -- "Good Sport" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) ...snip... Type 'q()' to quit R. >

Great. Let's see if R is using the BLAS and LAPACK libraries we selected. To do so, we open another terminal so that we can run a few more shell commands. First, we find the PID of the R process we just started. Your output will look something like this:

$ ps aux | grep exec/R 1000 18065 0.4 1.0 200204 87568 pts/1 Sl+ 09:00 0:00 /usr/lib/R/bin/exec/R root 19250 0.0 0.0 9396 916 pts/0 S+ 09:03 0:00 grep --color=auto exec/R

The PID is the second number on the '/usr/lib/R/bin/exec/R' line. To see

which BLAS and LAPACK libraries are loaded with that R session, we use the "list open files" command:

$ lsof -p 18065 | grep 'blas\|lapack' R 18065 nathanvan mem REG 8,1 9342808 12857980 /usr/lib/lapack/liblapack.so.3gf.0 R 18065 nathanvan mem REG 8,1 19493200 13640678 /usr/lib/openblas-base/libopenblas.so.0

As expected, the R session is using the reference LAPACK (/usr/lib/lapack/liblapack.so.3gf.0) and OpenBLAS (/usr/lib/openblas-base/libopenblas.so.0)

### Testing the different BLAS/LAPACK combinations

I used Simon Urbanek's most recent benchmark script. To follow along, first download it to your current working directory:

$ curl http://r.research.att.com/benchmarks/R-benchmark-25.R -O

And then run it:

$ cat R-benchmark-25.R | time R --slave Loading required package: Matrix Loading required package: lattice Loading required package: SuppDists Warning message: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘SuppDists’ ...snip...

Ooops. I don't have the SuppDists package installed. I can easily load it via Michael Rutter's ubuntu PPA:

$ sudo apt-get install r-cran-suppdists

Now Simon's script works wonderfully. Full output

$ cat R-benchmark-25.R | time R --slave Loading required package: Matrix Loading required package: lattice Loading required package: SuppDists Warning messages: 1: In remove("a", "b") : object 'a' not found 2: In remove("a", "b") : object 'b' not found R Benchmark 2.5 =============== Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 1.36566666666667 2400x2400 normal distributed random matrix ^1000____ (sec): 0.959 Sorting of 7,000,000 random values__________________ (sec): 1.061 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 1.777 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 1.00866666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.13484335940626 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 0.566999999999998 Eigenvalues of a 640x640 random matrix______________ (sec): 1.379 Determinant of a 2500x2500 random matrix____________ (sec): 1.69 Cholesky decomposition of a 3000x3000 matrix________ (sec): 1.51366666666667 Inverse of a 1600x1600 random matrix________________ (sec): 1.40766666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.43229160585452 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 1.10533333333333 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 1.169 Grand common divisors of 400,000 pairs (recursion)__ (sec): 2.267 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 1.213 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 1.32600000000001 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.23425893178325 Total time for all 15 tests_________________________ (sec): 19.809 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.26122106386747 --- End of test --- 134.75user 16.06system 1:50.08elapsed 137%CPU (0avgtext+0avgdata 1949744maxresident)k 448inputs+0outputs (3major+1265968minor)pagefaults 0swaps

Where the elapsed time at the very bottom is the part that we care about. With OpenBLAS and the reference LAPACK, the script took 1 minute and 50 seconds to run. By changing around the selections with update-alternatives, we can test out R with ATLAS (3:21) or R with the reference BLAS (9:13). For my machine, OpenBLAS is a clear winner.

Give it a shot yourself. If you find something different, let me know.

Pingback: My Stat Bytes talk, with slides and code | Nathan VanHoudnos

EDiThanks for this post!

For me:

OpenBLAS: 1:30

BLAS: 4:15

thiagogmGreat post VanHoudnos, I tried to quickly follow your instructions but got the following error when starting R after selecting option 3 in

'sudo update-alternatives --config libblas.so.3gf'. Any thoughts?

Error in dyn.load(file, DLLpath = DLLpath, ...) :

unable to load shared object '/usr/lib/R/library/stats/libs/stats.so':

/usr/lib/liblapack.so.3gf: undefined symbol: ATL_chemv

During startup - Warning message:

package ‘stats’ in options("defaultPackages") was not found

nmvPost authorThe error that it's throwing is related to your LAPACK selection, not your BLAS selection. Do you have ATLAS selected for LAPACK and OpenBLAS selected for the BLAS?

thiagogmYou are right. Problem solved. Thanks again for the nice post.

kavhow did you solve it?

nmvPost author@kav Make sure that you select the matching option for

$ sudo update-alternatives --config libblas.so.3gf

and

$ sudo update-alternatives --config liblapack.so.3gf

kavThanks!

isomorphismesThanks; I was getting the same error and thought I needed to

`./config --with`

some other options. But it was`sudo update-alternatives --config liblapack.so.3gf`

and choose`/usr/lib/lapack/liblapack.so.3gf`

to solve it. (This is now on R 3.1.1 "Sock it to Me".)felixwould something like this be possible on windows machines?

nmvPost authorI don't know. Anyone who uses windows care to comment?

Zachary MayerFor reference, mac users can use Apple's version of BLAS in the accelerate framework using:

cd /Library/Frameworks/R.framework/Resources/lib

ln -sf /System/Library/Frameworks/Accelerate.framework/Frameworks/vecLib.framework/Versions/Current/libBLAS.dylib libRblas.dylib

You can go back to the default BLAS using:

cd /Library/Frameworks/R.framework/Resources/lib

ln -sf libRblas.0.dylib libRblas.dylib

For me (on R 3):

Regular BLAS: 141 seconds (2.35 minutes)

Apple's BLAS: 43 seconds (0.71 minutes)

For more info, read here:

http://r.research.att.com/man/RMacOSX-FAQ.html#Which-BLAS-is-used-and-how-can-it-be-changed_003f

and here:

https://groups.google.com/forum/#!topic/r-sig-mac/k4rDRRdtNwE

Note that R 3.0 no longer includes libRblas.vecLib.dylib, but you can still link against the system version of libBLAS.

nmvPost authorThanks Zachary! I've added a link to this comment in the body of the post.

svsHow your research corresponds to benchmarks from gcbd R package?

nmvPost authorI don't think that I understand your question. Could you rephrase it?

SVSPlease see gcbd reference manual, http://cran.r-project.org/web/packages/gcbd/index.html. They also compare several BLAS implementations. However gcbd is several years old by now. It has found that Goto BLAS is ahead of Atlas, consistent with your findings (in my understanding, Goto BLAS is superseded by OpenBLAS). However, gcbd includes Intel MKL in the set of compared BLAS implementations. It would be interesting to know how OpenBLAS performance corresponds to the latest Intel MKL.

nmvPost authorI think you are correct.

The other take away from gcbd is that if you compile the BLAS yourself, you can likely get even better improvements.

To the best of my knowledge, since MKL needs to be compiled on Ubuntu, a fair comparison of ATLAS, OpenBLAS, and MKL would need to compile all three. That's a bit more work than I think most want to put in to squeeze a bit more performance out of R.

However, if you would like to do it, or know of anyone who has, let me know and I'll add a link to the body of the post.

svsSee this post

http://www.r-bloggers.com/compiling-r-3-0-1-with-mkl-support/

on how to compile R with Intel MKL; It seems there is no need to compile on Ubuntu, just download.

cafThanks for the post!

So little work, so big improvement.

Before: 2.30 min

After: 0.50 min

Pingback: Optimizing R with Multi-threaded OpenBLAS | Thiago G. Martins

Anirban MukherjeeUnfortunately openblas is not playing well with multicore/parallel. mcexit seems to cause a segfault. https://stat.ethz.ch/pipermail/r-sig-debian/2013-August/002147.html

Robert WilliamsI also see segfaults when running R parallel when openblas is installed.

Scott LocklinI don't get a segfault (Ubuntu 12.04 LTS), but all my cores go to 25% when I run with OpenBLAS and parallel. It works OK with ATLAS though.

khmcan I use it with netlib clapack + windows + visual studio ? how ?

nmvPost authorI don't know. I don't use windows.

philchalmersThis seems really cool and extremely straightforward, but I'm having some issues getting it set up with Ubuntu 13.10. After installing libopenblas-base and libatlas3gf-base when I try to set the alternatives I get

$ sudo update-alternatives --config libblas.so.3gf

update-alternatives: error: no alternatives for libblas.so.3gf

I've tried installing libopenblas-dev, but the same issue occurs. Any idea why this might be or how to fix it? Thanks so much.

nmvPost authorUnfortunately I haven't made the jump to 13.10; my plan is to wait until 14.04 LTS comes out.

My guess is that the 13.10 version of the packages isn't properly updating the symlinks in /etc/alternatives. What does

$ ls /etc/alternatives/lib*

give you?

On 12.04 I get (with extra things removed):

$ ls /etc/alternatives/lib*

/etc/alternatives/libblas.a

/etc/alternatives/liblapack.so.3gf

/etc/alternatives/libblas.so

/etc/alternatives/liblapack.a

/etc/alternatives/libblas.so.3gf

/etc/alternatives/liblapack.so

My assumption is that you won't see the '/etc/alternatives/libblas.so.3gf' line. At least I think that is why you are getting that error message.

You might try contacting the package maintainers. It seems like a bug (and one they would want to know about!) Please report back on what you find.

rphilipchalmersActually I do see it there. Here's is what is in the directory:

$ ls /etc/alternatives/lib*

/etc/alternatives/libblas.a

/etc/alternatives/libblas.so.3gf

/etc/alternatives/liblapack.so

/etc/alternatives/libtxc-dxtn-i386-linux-gnu

/etc/alternatives/libblas.so

/etc/alternatives/libgksu-gconf-defaults

/etc/alternatives/liblapack.so.3

/etc/alternatives/libtxc-dxtn-x86_64-linux-gnu

/etc/alternatives/libblas.so.3

/etc/alternatives/liblapack.a

/etc/alternatives/liblapack.so.3gf

/etc/alternatives/libxnvctrl.a

So it's there. Something else might be going on then...Thanks for the quick reply, and let me know if you can think of anything else. Otherwise I'll look into filing a bug report. Cheers.

nmvPost authorThat is strange! Unfortunately we have reached the limit of my expertise. Best of luck with getting it fixed.

Sam BEver find a solution to this? I'm having the same issue.

Christian SchmiedeckeIt seems that the names have changed. Try

sudo update-alternatives --config libblas.so.3

and

sudo update-alternatives --config liblapack.so.3

nmvPost authorThanks! I have updated the main post.

safishHi,

Thanks for this helpful post. I am not using OpenBLAS, but I directly use GotoBLAS2 which OpenBLAS is also based on, and I have been stuck for the last several days at an issue, and google search or asking on forums were not helpful at all. Hope you would have a response..

I am trying to use GotoBLAS2 on R 3.0 on Unix. I downloaded GotoBLAS2 source code from TACC web site, compiled it, and replaced libRblas.so with libgoto2.so, following the instructions at the link http://www.rochester.edu/college/gradstudents/jolmsted/files/computing/BLAS.pdf. The simple matrix operations in R like "determinant" are 20 times faster than before (I am using huge matrices), which is good. However, I cannot use many cores in parallel now.

You may tell that "You do not need to use multiple cores while using GotoBLAS2, it already uses multiple threads, and even multiple cores.". But I still need to use multiple cores not for simple matrix operations, but for performing some tasks on many different files independently, which is a great reason for parallelism. What's horrible is, after replacing libRblas.so with libgoto2.so, I cannot use %dopar% any more in any script. Operations using %dopar% takes forever.

Below is an example showing that GotoBLAS2 gets stuck when I use %dopar% (that's not my aim for using multiple cores, it is just an example). This code was still running after 24 hours, when I finally killed it. But if I use %do% instead of %dopar%, it takes just a second. When I was using R's default BLAS library, I could get the result from below code with %dopar% in a few seconds. (Btw, my machine has 24 cores)

library("foreach")

library("doParallel")

registerDoParallel(cores=2)

set.seed(100)

foreach (i = 1:2) %dopar% {

a = replicate(1000, rnorm(1000))

d = determinant(a)

So, is it possible to use many cores at the same time with GotoBLAS2, do you have any ideas?

Thanks a lot in advance.

nmvPost authorHi safish,

Unfortunately, this seems to be a bug with GotoBLAS2 / Open BLAS. This comment has a few more details.

My apologies that I cannot be of more use. Perhaps add a +1 to the bug tracking so that the developers would consider addressing the issue?

safishThanks, I think this is a bug. I ended up using BLAS single-threaded by setting the GOTO_NUM_THREADS environment variable whenever I use R multicore.

Mauricio Zambrano-BigiariniThanks you very much Nathan for this very useful post.

I tried it in LinuxMint 16 (MATE), but when I run:

sudo update-alternatives --config libblas.so.3gf

I got the following error message:

"update-alternatives: error: no alternatives for libblas.so.3gf"

However, thanks to one of your replies to a previous post, I was able to find a solution with:

sudo update-alternatives --config libblas.so.3

nmvPost authorThanks! I have updated the main post.

Mauricio Zambrano-BigiariniFor LinuxMint 16 (MATE)I forgot to include the command:

sudo update-alternatives --config liblapack.so.3

for choosing between LAPACK implementations

Pingback: Computational Prediction - BLAS 설정으로 R, numpy 성능 높히기

Matthew BrombergThe parallel bug for openblas will bite you in linux mint 16 if you use python and numpy. I'm seeing the same nasty effect of all my cores pegged at 100% while linear algebra slows to a crawl.

Pingback: The performance gains from switching R’s linear algebra libraries | On the lambda

Micha MI've tried this, and have had no problem setting up the libs on Ubuntu 13.10. However, I'm quite surprised by the results I'm getting. I don't use R, so instead I wrote a very rudimentary Octave benchmark script:

rand("seed",5)

rm1 = rand(1000,1000);

tic

for k=1:200

rm1 = rm1 * inv(rm1);

end

toc

Essentially I multiply the matrix by its inverse and storing it back in the original, and repeat this operation 200 times. Now here's what surprises me: when I try it with OpenBLAS, I see the CPU percentage go to ~400% (all cores utilized) and it takes some 42.5 seconds. When I use ATLAS, CPU usage is limited to 100% and it takes 109 seconds. BUT, when I use the default libblas I also get only 100% CPU - but the time the script takes is only 23.5 seconds! I ran the script several times with basically the same results. How come the optimized libs are getting spanked so thoroughly by the default implementation?

nmvPost authorHi Micha,

My hunch is that it has something to do with rm1 being set to the identity matrix after the first iteration. Perhaps the standard lib is smarter with identity matrix inversion than the optimized libs.

Try the same script, but move the random number generation inside of the loop. Does the standard library still get spanked?

Micha MThanks for your suggestion. Of course, it was foolish of me to let the operations run on the identity rather than on a random matrix. However, I didn't actually want to move the random generation into the loop - I don't know how long it takes to run so it might skew the results. So instead, I changed the operation inside the loop to:

rm1 = rm1 * (inv(rm1) - eye1000);

with eye1000 being an identity matrix of rank 1000. This has had a dramatic effect on results: the standard BLAS now runs the benchmark (using one core, as previously) in 554 seconds, while OpenBLAS takes just 66 (and utilizes 4 cores, or actually 2 hyperthreaded physical cores). Amazing.

One thing that still bothers me, though, is understanding what's going on. You've suggested that the standard BLAS is smarter with the identity matrix. While this could definitely be the case, in order to do that it would need to know in the first place that it IS an identity matrix that it's working on. Now, this would require two things - one is that the inversion of rm1 and the subsequent multiplication are both numerically accurate enough that ALL non-diagonal values can be ignored; the other is that a routine is run to check whether the matrix is an identity matrix before the calculation begins. I think that the chances for each of these hapenning are small. It seems more plausible to me that the matrices are checked not for being identity matrices but for being sparse. In the original implementation of the script, even with non-perfect numerical accuracy a large portion of values would be zero after a couple of iterations. In the new implementation, the matrices are never sparse. So, I now know that OpenBLAS berfoms way better on dense matrices but I still have to test it on sparse ones.

As I'm writing this, I've now made a small test to see how quickly we have numerical convergence towards the identity matrix. It seems to converge nicely - After one rm1=rm1*inv(rm1), there are only 949 values in the whole matrix that are actually 0 and only 1 is actually 1. After 2 iterations there are 42900 zeros in the matrix and all 1000 1's are correct, after 3 iterations there are 330,000 zeros, after 4 for some reason it's back to 87500, after 5 it's 287,000 and after teh 6th iteration finally we have all 999,000 correctly identified. So, this leaves open the question whether matrices are checked by the standard lib only for sparsity or also for being an identity matrix.

vishalbelsareWhile comparing ATLAS and OpenBLAS on Ubuntu, has ATLAS been compiled on the target machine or has it been installed with an 'apt-get install' incantation?

My experience is that the apt-get way gets us ATLAS which is not geared for exploiting multicore CPUs while compiling OpenBLAS one is likely to set the threads to machine specification.

While I have both OpenBLAS and ATLAS installed, I compiled ATLAS, by getting the source by 'apt-get source' and then building the package on a 12 core target machine.

PuddingThanks for sharing.

I wonder there are any ways to specify the threads number? My machine is 2 cores. And the thread info tells me I'm using 2 threads when I'm running R command.

Can it be set to 4 threads for R computation?

nmvPost authorUnfortunately, I don't know the answer. Perhaps someone else will be able to shine light on this for us.

PuddingI've tried this R package (https://github.com/simonfullernuim/OpenBlasThreads), but it doesn't seem to work on me. It can just change thread number from 1 to 2.

Pingback: 同时通过OpenBLAS和mclapply加速R运算 | f(Program,Poet)=Programet

dan** Memory and CPU comparison **

Hi Nathan,

Many thanks for this extremely useful post - I would recommend anyone who is using lapack to spend 5 minutes reading this. A few

observations:

(a) In the code you have posted in your article

sudo update-alternatives –config liblapack.so.3

should be

sudo update-alternatives --config liblapack.so.3

(b) I do not use R but call lapack from a Fortran program using Linux (Mint (mate) 16)

For linux users a simple way to time your programs is given by

/usr/bin/time -v [./program.exe]

where [./program.exe] is whatever you normally type on the command line

to run your program (e.g. ./fort.exe , python program.py ,...)

(c) For generating and solving an Ax = b matrix equation for matrix of size 10,000x10,000

my results were (min time of 3 runs, performed on quiet core i7 -3930K, 64GB RAM) :

elapsed (wall clock) time memory

default blas, default lapack: 2:42.94 736MB

atlas blas, atlas lapack: 1:21.00 742MB

open blas, default lapack: 0:55.28 1200MB

As you can see openblas was the fastest of the three, almost 3x faster than the default, however openblas does also require the largest

amount of RAM, so if you have restricted memory atlas might be a good choice - giving a 2x speed up while only requiring slightly more memory than the default.

nmvPost authorHi Dan,

Thanks for the memory comparison. Since my problems are usually CPU bound, it didn't occur to me to dig into memory usage.

I have one quick question about your comment. In part a) of your comment, the two update-alternatives lines are the same. Did you intend to point out the difference between liblapack.so.3 and liblapack.so.3gf that I (attempted to) document in the "Side Notes"?

danHi Nathan,

Yes, apologies my correction has been rendered the same as the original. I was trying to say that you need a double hyphen (two minus signs) before the 'config' rather than just one. I discovered the problem by attempting to copy-paste from the code you have included in your original post above.

The code you have for the update-alternatives for the libblas.so.3 is correct (it has a double-hyphen before the config), and since in your guide

this comes before the lapack update-alternatives I think most people will work out what the problem is with the lapack line as I did, but anyway just to let you know. Thanks again for this extremely useful posting.

Cheers, Dan.

Pingback: Optimizing R | Jeffrey Chin Tang Wong

Pingback: Cuando tus herramientas fallan: Ubuntu, R, Atlas y fallos bizantinos

Jeremy DuncanOn ubuntu 12.04 I am getting clashes with octave3.2 and R packages :- R packages cannot find lblas.

I am using Rstudio Version 0.98.501 and R

R version 3.1.1 (2014-07-10) -- "Sock it to Me"

Copyright (C) 2014 The R Foundation for Statistical Computing

Platform: i686-pc-linux-gnu (32-bit)

So when I apt-get octave3.2 it remove libblas.

I have removed octave and built all my R packages which may help.

I am also going to try and build Octave 3.8 from sources.

I don't understand the update-alternatives which may be a good part of the problem.

Any suggestions greatly appreciated,

Jeremy

thanks, Jeremy

nmvPost authorI am not quite clear what the problem is. Could you give more detail?

Pingback: Compile R and OpenBLAS from Source Guide - Lindons Log

lindonslogI compiled openblas with NO_AFFINITY=1 but my R process is always at 100% CPU usage, whereas it should be at 800%. It's not working for me. Using /cat/PID/status i can see that R has 8 threads and if I use ldof -p PID I can see that the openblas library is open, but still, only 100%.

nmvPost authorI don't quite understand. Why are you compiling openblas?

lindonslogI'm using red hat on my office computer and I do not have root privileges to use yum so I'm building R and openblas from source in my home directory. Don't worry though, I managed to get things working correctly since the last post

Joe HermanVery thorough post -- thanks. However, I think the main conclusion (OpenBLAS is faster than ATLAS) is dependent on the specific way you set up ATLAS.

Firstly, let's focus on the key part of the R benchmark from the perspective of testing the BLAS & LAPACK libraries, which is the 'Matrix functions' section. When I run the test on my system (running R 3.1.1) using OpenBLAS and the apt-get version of ATLAS, I get the following:

# OpenBLAS

FFT over 2,400,000 random values____________________ (sec): 0.221333333333333

Eigenvalues of a 640x640 random matrix______________ (sec): 0.796999999999999

Determinant of a 2500x2500 random matrix____________ (sec): 0.19

Cholesky decomposition of a 3000x3000 matrix________ (sec): 0.161666666666665

Inverse of a 1600x1600 random matrix________________ (sec): 0.188333333333335

Trimmed geom. mean (2 extremes eliminated): 0.199331471542089

Overall time: 64.95user 21.82system 0:30.80elapsed 281%CPU

# ATLAS:

FFT over 2,400,000 random values____________________ (sec): 0.231666666666667

Eigenvalues of a 640x640 random matrix______________ (sec): 0.333666666666666

Determinant of a 2500x2500 random matrix____________ (sec): 0.707333333333334

Cholesky decomposition of a 3000x3000 matrix________ (sec): 0.610333333333332

Inverse of a 1600x1600 random matrix________________ (sec): 0.595333333333336

Trimmed geom. mean (2 extremes eliminated): 0.494933333222859

Overall time: 43.97user 0.32system 0:44.31elapsed 99%CPU

In this case, OpenBLAS is faster for everything except eigenvalue computation, but ATLAS is clearly only using one core.

However, installing ATLAS properly (downloading and compiling from http://sourceforge.net/projects/math-atlas), and rebuilding R --with-blas and --with-lapack, using static libraries, I get the following:

# ATLAS:

FFT over 2,400,000 random values____________________ (sec): 0.220333333333333

Eigenvalues of a 640x640 random matrix______________ (sec): 0.35

Determinant of a 2500x2500 random matrix____________ (sec): 0.196666666666668

Cholesky decomposition of a 3000x3000 matrix________ (sec): 0.152

Inverse of a 1600x1600 random matrix________________ (sec): 0.199999999999999

Trimmed geom. mean (2 extremes eliminated): 0.205406249285989

Overall time: 46.73user 1.51system 0:29.21elapsed 165%CPU

which is essentially the same as OpenBLAS on all accounts (except for eigenvalues, where ATLAS is still much quicker), despite using less CPU power.

The main conclusion of my testing is that OpenBLAS isn't faster than ATLAS, but it is much easier to install OpenBLAS via apt-get than it is to compile ATLAS and R manually from source. Hence, for a 'quick fix' on Ubuntu to improve R from its default, OpenBLAS may still be the best option. However, for optimal performance (requiring a bit more effort to set up), ATLAS may be better.

nmvPost authorThanks for sharing. When I transition my laptop to Ubuntu 14.04 I'll makes some time to build ATLAS and give it a shot.

Hong LiuThanks for sharing!

I have spent a huge amount of time on building the ATLAS from the source code on the OpenSUSE 13.1... Is it just a waste of time?

1. Does that easy-install (without tuning on your computer) "libatlas" in the repository really improve the computation performance?

2. is OpenBLAS better than ATLAS or only better than the easy-install "libatlas" in the repositories of Ubuntu and OpenSUSE?

nmvPost author1) I don't know. I have not run the comparison with a tuned version of ATLAS.

2) It depends. See this comment by Joe Herman for a partial counter-example.

Pingback: Numpy with ATLAS or OpenBLAS?

Pingback: R Performance (Part II) | Daniel Nee

gwernSeems to result in a nice speedup on my system. I have a fairly ordinary Dell Studio 17 from ~2011, running Debian testing (Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt2-1 (2014-12-08) x86_64 GNU/Linux). Here are my benchmark results:

BLAS; Atlas wins:

1. /usr/lib/libblas/libblas.so.3

Total time for all 15 tests_________________________ (sec): 62.238

Overall mean (sum of I, II and III trimmed means/3)_ (sec): 2.11989509735432

2. /usr/lib/openblas-base/libblas.so.3:

Total time for all 15 tests_________________________ (sec): 27.2746666666667

Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.61748573399343

3. /usr/lib/atlas-base/atlas/libblas.so.3:

Total time for all 15 tests_________________________ (sec): 25.804

Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.4524236231769

LAPACK, OpenBLAS seems to perform best:

1. /usr/lib/atlas-base/atlas/liblapack.so.3:

Total time for all 15 tests_________________________ (sec): 33.4963333333334

Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.85297027369235

2. /usr/lib/lapack/liblapack.so.3

Total time for all 15 tests_________________________ (sec): 23.969

Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.3878476211767

3. /usr/lib/openblas-base/liblapack.so.3:

Total time for all 15 tests_________________________ (sec): 22.1063333333333

Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.24025539442091

So worst vs best is: 22.1s vs 62.2s

Pingback: How not to do linear algebra in the cloud | Datagami Blog

jabiraliThanks for this interesting post! I'm currently working on a project in Fortran (not R), but since my code involves a lot of matrix algebra (nonlinear matrix differential equations), switching from the reference implementation to the ATLAS implementation with the commands you suggested yielded a nearly 35% decrease in computation time.

In my case, however, ATLAS seemed to perform better than OpenBLAS — possibly because my code involves many calls to LAPACK routines, but few direct calls to BLAS routines.