`rep( 50000000 , 6 )`

, but it had a 50,000,000 element vector being passed back. That’s why it was slow.
For example, if we do something like sum the 50,000,000 element vector on the workers all we are passing back is a single number. In this case `mclapply.hack()`

wins.

source("http://www.stat.cmu.edu/~nmv/setup/mclapply.hack.R") system.time( lapply( rep( 50000000 , 6 ) , function(xx) { sum(runif(xx)) })) ## user system elapsed ## 10.75 0.64 11.43 system.time( mclapply( rep( 50000000 , 6 ) , function(xx) { sum(runif(xx)) })) ## user system elapsed ## 0.04 0.03 5.14

As to your question about `svyby`

, go ahead and give it a shot. I don’t know enough (really anything!) about that package to know in which direction it will go.

so that’s why svyby’s multicore=TRUE will not benefit from this on windows, because the survey design object is large and needs to be transferred to each of the child processess, eliminating any gains. http://stackoverflow.com/questions/24737166/is-it-possible-to-get-the-r-survey-packages-svyby-function-multicore-paramet/24737167#24737167

thank you!!!

]]>`mclapply.hack()`

will lose to `lapply()`

unless the savings from doing the work in parallel overcome the necessary overhead to do that work in parallel. For your example, random number generation is fast, so the majority of the time is likely taken in transferring the giant uniform vector back to the parent process, not generating the giant vector.
Here is an example where the computation on each of the parallel workers takes some time:

source("http://www.stat.cmu.edu/~nmv/setup/mclapply.hack.R") ## This machine has a four cores. detectCores() ## [1] 4 ## Run the parallel version. system.time( par.out < - mclapply( (1:4+rep(2500, 4)), function(xx) { set.seed(xx) test.matrix <- matrix( rnorm(xx^2), nrow=xx ) pd.matrix <- t(test.matrix) %*% test.matrix return( chol(pd.matrix) ) }) ) ## user system elapsed ## 1.16 1.41 40.06 ## Run the serial version. system.time( serial.out <- lapply( (1:4+rep(2500, 4)), function(xx) { set.seed(xx) test.matrix <- matrix( rnorm(xx^2), nrow=xx ) pd.matrix <- t(test.matrix) %*% test.matrix return( chol(pd.matrix) ) }) ) ## user system elapsed ## 75.80 0.25 76.13 ## ## ... parallel version is about 30 seconds faster. ## Did we get the same answer? all.equal( serial.out, par.out ) ## [1] TRUE ## Note that if we run the parallel version again, it will take ## longer because par.out and serial.out are being copied to ## all of the clusters... even though the function doesn't use them. ## ... How big are they? print(object.size(serial.out), units='Mb') ## 191.1 Mb print(object.size(par.out), units='Mb') ## 191.1 Mb ## ... How much longer does it take? system.time( par.second.out <- mclapply( (1:4+rep(2500, 4)), function(xx) { set.seed(xx) test.matrix <- matrix( rnorm(xx^2), nrow=xx ) pd.matrix <- t(test.matrix) %*% test.matrix return( chol(pd.matrix) ) }) ) ## user system elapsed ## 3.13 18.84 59.21 ## ... Therefore about 20 seconds was lost to needlessly transferring ## the previous R objects.

Hope this helps.

]]>I am using Rstudio Version 0.98.501 and R

R version 3.1.1 (2014-07-10) — “Sock it to Me”

Copyright (C) 2014 The R Foundation for Statistical Computing

Platform: i686-pc-linux-gnu (32-bit)

So when I apt-get octave3.2 it remove libblas.

I have removed octave and built all my R packages which may help.

I am also going to try and build Octave 3.8 from sources.

I don’t understand the update-alternatives which may be a good part of the problem.

Any suggestions greatly appreciated,

Jeremy

thanks, Jeremy

]]>