When my great-grandfather came to Nebraska from Belgium, he changed his last name from Van Houwenhuyse to VanHoudenos. No one in the family quite knows why he made the change, but there it is. My guess is that he wanted to break with the old world, so he adopted what he thought was a more American version of the name. Perhaps he chose to eliminate the space because it was confusing to Americans. I could imagine something like this happening: "Hey Peter, why do you always say your middle name? Wouldn't it be simpler to introduce yourself as Peter Houdenos?"

At some point before my father was born, my grandfather dropped the e from VanHoudenos to become VanHoudnos. No one quite knows why grandpa did that. His brother, for example, did not, so there is a whole branch of our family tree that retains the VanHoudenos spelling. Unfortunately, we never got a chance to ask grandpa why he changed the spelling because he died when my father was young. In any case, my father kept the spelling as VanHoudnos.

In a sense, I could reasonably be referred to as

- Nathan Van Houwenhuyse, if I were to retain the original Flemish spelling. Note that the V is capitalized because that's just the way Flemish surnames are spelled;
- Nathan VanHoudenos, if I were to adopt my great-grandfather's Americanized spelling; or
- Nathan VanHoudnos, which is how I prefer to be known because it is the name my parents chose to give me.

The irregular nature of my last name can cause confusion. I have seen the following alternate spellings in the wild:

- Nathan Vanhoudnos among those who think its weird to have a capital letter in the middle of your last name;
- Nathan van Houdnos among those who are familiar with non-Flemish Dutch last names; and
- Nathan Van Houdnos among those familiar with Flemish last names.

In any case, I prefer Nathan VanHoudnos. It may be irregular, but it is what it is.

]]>It seems that the heir to WinBUGS is Stan. With Stan, reasonably complex Bayesian models can be expressed in a compact way and easily estimated. It is good software, and it is under active development to further improve it.

I have a small quibble about RStan, the R interface to Stan. RStan would be much improved if its default behavior was to run one MCMC chain per core. For software that prides itself on speed - Stan goes to the trouble translating the Stan modeling specification into a stand-alone C++ program for execution - it seems a little odd that the extra cores present on nearly all modern machines would not be put to use by default.

Currently, running chains in parallel is possible, but only with platform dependent boilerplate code. For example, the RStan Quick Start Guide gives an `mclapply`

example for Mac and Linux users and a `parLapply`

example for Windows users. The boilerplate nature of the code makes it cumbersome to fit models several times, and the platform dependent nature of the examples makes it difficult to share code between platforms.

To address this issue, I have implemented the boilerplate code from the Quick Start Guide in a cross-platform R package: `rstanmulticore`

. The syntax is easy. Simply replace a call to `stan`

fit.serial <- stan( model_code = schools_code, data = schools_dat, iter = 1000, chains = 4)

with a call to `pstan`

fit.parallel <- pstan(model_code = schools_code, data = schools_dat, iter = 1000, chains = 4)

The `pstan`

version will compile the model, distribute the compiled models to separate cores for a parallel run, and then recombine the results as if the code had execute serially. Since I used `parLapply`

to distribute the work to multiple cores, `pstan`

will run on Windows, Linux, and Mac.

At least as far as I have pushed `pstan`

, it works well for me. Your needs may differ. I would appreciate feedback and suggestions on how to improve it. You can access it via GitHub here. Installation instructions and a brief usage example are below.

**Step 0.A **: If you do not already have `rstan`

installed, install it using the instructions here.

**Step 0.B**: If you do not already have `devtools`

installed, install it using the instructions here.

**Step 1**: Install `rstanmulticore`

directly from my GitHub repository using `install_github('nathanvan/rstanmulticore')`

.

> library(devtools) > install_github('nathanvan/rstanmulticore') Downloading github repo nathanvan/rstanmulticore@master Installing rstanmulticore "C:/PROGRA~1/R/R-31~1.3/bin/x64/R" --vanilla CMD INSTALL \ "C:/Users/vanhoudnos-nathan/AppData/Local/Temp/RtmpQBcRKa/devtools924351029d0/nathanvan-rstanmulticore-c7f9d4e" \ --library="C:/Users/vanhoudnos-nathan/Documents/R/win-library/3.1" --install-tests * installing *source* package 'rstanmulticore' ... ** R ** tests ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** testing if installed package can be loaded *** arch - i386 *** arch - x64 * DONE (rstanmulticore)

We begin with the default "Eight Schools" example from the Quick Start Guide using the default `stan`

function:

library(rstan) ## Loading required package: Rcpp ## Loading required package: inline ## ## Attaching package: 'inline' ## ## The following object is masked from 'package:Rcpp': ## ## registerPlugin ## ## rstan (Version 2.6.0, packaged: 2015-02-06 21:02:34 UTC, GitRev: 198082f07a60) ## The data to analyze (Yes, it is very little!) schools_dat <- list( J = 8, y = c(28, 8, -3, 7, -1, 1, 18, 12), sigma = c(15, 10, 16, 11, 9, 11, 10, 18)) ## The Stan model for the data, stored as a string schools_code <- 'data { intJ; // number of schools real y[J]; // estimated treatment effects real sigma[J]; // s.e. of effect estimates } parameters { real mu; real tau; real eta[J]; } transformed parameters { real theta[J]; for (j in 1:J) theta[j] <- mu + tau * eta[j]; } model { eta ~ normal(0, 1); y ~ normal(theta, sigma); }' ## The data to analyze (Yes, it is very little!) schools_dat <- list( J = 8, y = c(28, 8, -3, 7, -1, 1, 18, 12), sigma = c(15, 10, 16, 11, 9, 11, 10, 18)) ## The Stan model for the data, stored as a string schools_code <- 'data { int J; // number of schools real y[J]; // estimated treatment effects real sigma[J]; // s.e. of effect estimates } parameters { real mu; real tau; real eta[J]; } transformed parameters { real theta[J]; for (j in 1:J) theta[j] <- mu + tau * eta[j]; } model { eta ~ normal(0, 1); y ~ normal(theta, sigma); }' ## Estimating the model fit.serial <- stan( model_code = schools_code, data = schools_dat, iter = 1000, chains = 4, seed = 1) ## ## TRANSLATING MODEL 'schools_code' FROM Stan CODE TO C++ CODE NOW. ## COMPILING THE C++ CODE FOR MODEL 'schools_code' NOW. ## cygwin warning: ## MS-DOS style path detected: C:/PROGRA~1/R/R-31~1.3/etc/x64/Makeconf ## Preferred POSIX equivalent is: /cygdrive/c/PROGRA~1/R/R-31~1.3/etc/x64/Makeconf ## CYGWIN environment variable option "nodosfilewarning" turns off this warning. ## Consult the user's guide for more details about POSIX paths: ## http://cygwin.com/cygwin-ug-net/using.html#using-pathnames ## ## SAMPLING FOR MODEL 'schools_code' NOW (CHAIN 1). ## ## ... < snip > ... ## ## SAMPLING FOR MODEL 'schools_code' NOW (CHAIN 2). ## ## ... < snip > ... ## ## SAMPLING FOR MODEL 'schools_code' NOW (CHAIN 3). ## ## ... < snip > ... ## ## SAMPLING FOR MODEL 'schools_code' NOW (CHAIN 4). ## ## ... < snip > ...

Note that `stan`

is pretty verbose.

I chose to make `pstan`

less verbose. By default, `pstan`

reports sparse progress information to the R console and the more detailed information is redirected to a file, `stan-debug-*`

, that is created in the current working directory. (If you wish to see the detailed info in real time, use `tail -f`

in your shell.)

Usage is as follows:

library(rstanmulticore) ## Loading required package: parallel fit.parallel <- pstan( model_code = schools_code, data = schools_dat, iter = 1000, chains = 4, seed = 1) ## *** Parallel Stan run *** ## Working directory: ## C:/Users/vanhoudnos-nathan/workspace/norc/spencer-5866.01.62/software/tmp ## + Compiling the Stan model. ## + Attempting 4 chains on 4 cores. ## ... Creating the cluster. ## ... Log file: stan-debug.2015-05-01-12.38.21.txt ## ... Loading rstan on all workers. ## ... Exporting the fitted model and data to all workers. ## ... Running parallel chains. ## ... Finished!

If, in the unlikely case, you want no console output and no file redirection, you can pass `pdebug = FALSE`

to `pstan`

. See `help(pstan)`

for details.

Note that, as promised, the output -- the actual samples drawn from the posterior -- of `pstan`

is identical to that of `stan`

all.equal( fit.serial@sim$samples, fit.parallel@sim$samples ) ## [1] TRUE

As mentioned, `rstanmulticore`

works well for my needs, but it may not work for you. If it does not work for you, please let me know and I'll do my best to accommodate you. Pull requests and additional test cases are most welcome!

The recent 7.8-magnitude earthquake heavily damaged both Kathmandu and the surrounding rural areas. It hurts to see the images of the devastation. Thankfully, my colleagues in Nepal are all safe and accounted for.

I have decided to raise money for World Vision, a Christian relief organization, that has been working in Nepal since 1982. From World Vision's website:

How is World Vision helping?

- Over 1.1 million people reached through long-term development projects, including earthquake-preparedness trainings that educated over 65,000 people
- Currently operating 73 projects utilising 205 staff
- Emergency response teams are already mobilising relief
Through access to its regional warehouses in Nepal and Asia, World Vision has immediate access to necessary supplies, like hygiene kits, cooking kits, mosquito nets, sleeping bags and sleeping mats, buckets and water purification tablets, many of which are already on their way to remote village communities in desperate need of basic supplies.

World Vision will address the immediate needs of children, including establishing Child Friendly Spaces, which provide a safe environment for children to learn, play and emotionally recover from traumatic events.

Staff on the ground have prioritized getting potable water, food, temporary shelter, household supplies, and child protection, education and health programs to affected areas as soon as possible, with the aim of reaching 100,000 people in relief response.

Please keep the people of Nepal in your prayers and consider giving a gift of $50 or more to help them today.

Thank you,

Nathan VanHoudnos

]]>Consider the following regression model:
\begin{equation}\begin{aligned}
\Y & = \X\boldbeta + \e &
\e & \sim N(\Zero,\R) \label{eq:regmodel}
\end{aligned}\end{equation}
where is a **known** design matrix composed of **fixed numbers**, is a vector of unknown regression coefficients, and is a vector of the residual errors with covariance matrix .

There is a vast literature devoted to studying this model. For example, the optimal method of estimation for this model varies depending on the specification of . If where is the identity matrix, then Ordinary Least Squares (OLS) is optimal for estimation and hypothesis testing. If where is positive definite and known then Generalized Least Squares (GLS) is optimal. If is positive definite but unknown, then REstricted Maximum Likelihood (REML) has desirable properties.

Note that since is assumed to be composed of fixed numbers, it is not random. By definition, its covariance with any random variable is zero.

On the advice of Brian Kovak and Seth Richards-Shubik I use the econmetrics textbook used by CMU (and may other places):

Wooldridge, J. M. (2011). Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge, MA, 2nd edition.

as representative of the views of econometricians.

Although Wooldridge uses the same notation as the model of Equation \ref{eq:regmodel}, his model is quite different: the columns of the matrix are assumed to be draws from some random variable. His matrix is not fixed, it is random.

From the perspective of a statistician, Wooldridge is modeling as a **transformation of random variables**:
\begin{equation*}
Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_K X_K + U
\end{equation*}
such that are all observable **random variables**,
is an unobservable random variable, and
are constants that are fixed and
unknown. We observe a random draw from this structural model
\begin{equation*}
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_K x_K + u
\end{equation*}
such that is a draw from , is a draw from , etc.

Wooldridge outlines a series of assumptions that allow an econometrician to ignore the stochastic nature of the matrix and use the regression model of Equation \ref{eq:regmodel} instead of the true model. Wooldridge gives the key assumption a technical name: **exogeneity**. A variable is exogenous if it is uncorrelated with the error term . If all variables in the matrix are exogenous, then Wooldridge argues regression model of Equation \ref{eq:regmodel} is optimal for point estimation and hypothesis testing -- even though the model is wrong!

I, at least, had great trouble following Wooldridge's argument, so I created the following example to better understand his claims. I hope that it is useful to others.

Let be partitioned into two parts: an column of ones and columns of multivariate random normal vectors : \begin{aligned} \X & = \begin{bmatrix} \One_n & \X_1 & \X_2 & \dots & \X_{p-1} \end{bmatrix} \\ & = \begin{bmatrix} \One_n & \widetilde{\X}_{n\times (p-1)} \end{bmatrix} \end{aligned} where the subscripts denote the dimensions of the matrices. The multivariate random normal vectors can be correlated with each other. To express these correlations compactly, the operator (see Henderson and Searle (1979) or Wikipedia for properties) is used to stack the columns of the matrix on top of one another \begin{align*} \Vecm{\X} & = \begin{bmatrix} \One_n \\ \X_1 \\ \X_2 \\ \vdots \\ \X_{p-1} \end{bmatrix} \end{align*} so that the mean and variance can be expressed as \begin{align*} \E{\Vecm{\X}} & = \begin{bmatrix} \One_n \\ \M_1 \\ \M_2 \\ \vdots \\ \M_{p-1} \end{bmatrix} \\ \Var{\Vecm{\X}} & = \begin{bmatrix} \Zero_{n\times n} & \Zero \\ \Zero & \boldSigma_{n(p-1) \times n(p-1) } \end{bmatrix} \quad \end{align*} where the matrix of means and covariance matrix are assumed to be fixed but unknown. Note that the although the intercept is uncorrelated with the random portions of , the random portions of are allowed to covary arbitrarily.

Let the unobservable error be a mean zero multivariate normal, which can be correlated with the random portions of . We parametrize the joint distribution as \begin{align} \begin{pmatrix} \Vecm{\X} \\ \e \end{pmatrix} & = \begin{pmatrix} \One_n \\ \Vecm{\widetilde{\X}} \\ \e \end{pmatrix} \sim N \left( \A, \B \right) \label{eq:jointdist} \\ \A & = \begin{bmatrix} \One_n \\ \Vecm{\M} \\ \Zero_{n \times 1} \end{bmatrix} \nonumber \\ \B & = \begin{bmatrix} \Zero_{n\times n} & \Zero & \Zero \\ \Zero & \boldSigma_{n(p-1) \times n(p-1) } & \Q \\ \Zero & \Q^\top & \R_{n \times n} \end{bmatrix} \nonumber \quad \end{align} Finally, let be a linear transformation of random variables and such that \begin{align} \Y & = \X \boldbeta + \e \label{eq:randommodel} \end{align} where is a vector of regression coefficients.

With a little tedium and algebra, it can be shown that the distribution of from Equation \ref{eq:randommodel} is \begin{align} \Y & \sim N \left( \begin{bmatrix} \One_{n} & \M \end{bmatrix} \begin{bmatrix} \beta_0 \\ \widetilde{\boldbeta} \end{bmatrix} \quad , \quad \B_\boldSigma + \B_\Q + \B_\R \right) \label{eq:disty} \\ \B_\boldSigma & = \left( \widetilde{\boldbeta}^\top \otimes \I_n \right) \boldSigma \left( \widetilde{\boldbeta} \otimes \I_n \right) \nonumber \\ \B_\Q & = \left( \widetilde{\boldbeta}^\top \otimes \I_n \right) \Q + \left\{ \left( \widetilde{\boldbeta}^\top \otimes \I_n \right) \Q \right\}^\top \nonumber \\ \B_\R & = \R \nonumber \end{align} where corresponds to the regression coefficients for the random part of and is the direct product (see Henderson and Searle (1979) or Wikipedia for properties).

The marginal distribution of given in Equation \ref{eq:disty} is complex but it has the components that one might expect. The mean, for example, is a function of the means of the columns of and the vector of regression coefficients . The terms of the variance-covariance matrix correspond to the random parts of the model: represents the contribution of the nonzero covariance of the columns of , represents the contribution of the non-zero covariance between and , and represents the contribution of the non-zero covariance of the random error .

The marginal distribution of , however, is not useful for inference on because the marginal distribution of depends only on the unknown quantities , , , and . The data, , do not appear in the distribution at all!

One strategy to perform inference on is to condition on the observed values of and then restrict attention to cases where the resulting conditional distribution is tractable. We proceed as follows. Note that the conditional distribution of conditional on an observed draw of

has only one random component, the conditional distribution of the unobservable error. The rules for conditional normal distributions (via Wikipedia) and the joint distribution in Equation \ref{eq:jointdist} imply that is normally distributed with mean and variance \begin{align*} \Egiven{\e}{ \Vecm{ \widetilde{\X} } = \Vecm{\widetilde{\X}_0} } & & = & & \Zero_{n \times 1} + \Q^\top \boldSigma^{-1} \left( \Vecm{ \widetilde{\X}_0} - \Vecm{\M} \right) \\ \Vargiven{\e}{ \Vecm{ \widetilde{\X} } = \Vecm{\widetilde{\X}_0} } & & = & & \R + \Q^\top \boldSigma^{-1} \Q \quad . \end{align*} The mean and variance of are then \begin{align} \Egiven{\Y}{\X = \X_0} & & = & & \X_0 \boldbeta + \Q^\top \boldSigma^{-1} \left( \Vecm{ \widetilde{\X}_0} - \Vecm{\M} \right) \label{eq:condymean} \\ \Vargiven{\Y}{\X = \X_0} & & = & & \R + \Q^\top \boldSigma^{-1} \Q \nonumber \end{align} where the unknown quantities are , , , and . In the special case where , i.e. all of the columns of are uncorrelated with with the error , then the conditional distribution will only depend on and \begin{align} \given{Y}{\X=\X_0, \Q=\Zero} & \sim N\left(\X_0 \boldbeta \quad , \quad \R \right) \label{eq:keyresult} \quad , \end{align} which is the regression model of Equation \ref{eq:regmodel}.

That Equation \ref{eq:keyresult} is equivalent to Equation \ref{eq:regmodel} is somewhat shocking. We began with as a linear transformation of random variables contained in and ended up with a conditional model that is equivalent to a regression where is composed of fixed numbers. The key condition, that , i.e. in econometric parlance the columns of are exogenous, allows use to use all of our regression theory. It is quite wonderful.

Equations \ref{eq:condymean} and \ref{eq:keyresult} contain a lesson for statisticians: it can be shown that a regression estimate of a treatment effect from a designed experiment will be inconsistent if there is only one endogenous variable that was included in the design matrix. This is not something that I have considered in the past when including covariates in the model to increase my power to detect a treatment effect. Clearly, this is something that I should consider in the future!

]]>`mclapply()`

function. Unfortunately, `mclapply()`

does not work on Windows machines because the `mclapply()`

implementation relies on forking and Windows does not support forking.
For me, this is somewhat of a headache because I am used to using `mclapply()`

, and yet I need to support Windows users for one of my projects.

My hackish solution is to implement a fake `mclapply()`

for Windows users with one of the Windows compatible parallel `R`

strategies. For the impatient, it works like this:

require(parallel) ## On Windows, the following line will take about 40 seconds to run ## because by default, mclapply is implemented as a serial function ## on Windows systems. system.time( mclapply(1:4, function(xx){ Sys.sleep(10) }) ) ## user system elapsed ## 0.00 0.00 40.06 ## If we try to force mclapply() to use multiple cores on Windows, it doesn't work: system.time( mclapply(1:4, function(xx){ Sys.sleep(10) }), mc.cores=4 ) ) ## Error in mclapply(1:4, function(xx){ Sys.sleep(10) }), mc.cores=4 ) : ## 'mc.cores' > 1 is not supported on Windows ## Using the ideas developed in this post, we can implement ## a parallel (as it should be!) mclapply() on Windows. source("http://www.stat.cmu.edu/~nmv/setup/mclapply.hack.R") ## ## *** Microsoft Windows detected *** ## ## For technical reasons, the MS Windows version of mclapply() ## is implemented as a serial function instead of a parallel ## function. ## ## As a quick hack, we replace this serial version of mclapply() ## with a wrapper to parLapply() for this R session. Please see ## ## http://www.stat.cmu.edu/~nmv/2014/07/14/implementing-mclapply-on-windows ## ## for details. ## And now the code from above will take about 10 seconds (plus overhead). system.time( mclapply(1:4, function(xx){ Sys.sleep(10) }) ) ## user system elapsed ## 0.01 0.06 11.25

As we will see, however, there are a few reasons why no one has done this in the past.

On Linux or Mac, it is is very simple to parallelize R code across multiple cores. Consider the following function

wait.then.square <- function(xx){ # Wait for one second Sys.sleep(1); # Square the argument xx^2 }

If we want to run it on the integers from 1 to 4 in serial, it will take about 4 seconds

## Run in serial system.time( serial.output <- lapply( 1:4, wait.then.square ) ) ## user system elapsed ## 0.000 0.000 4.004

If we run it in parallel, it will take about 1 second:

## Run in parallel require(parallel) ## Note two changes: ## (1) lapply to mclapply ## (2) mc.cores (the number of processors to use in parallel) system.time( par.output <- mclapply( 1:4, wait.then.square, mc.cores=4 ) ) ## user system elapsed ## 0.008 0.000 1.008

And we can verify that the output is, in fact, the same:

## Check if the output is the same all.equal( serial.output, par.output ) ## [1] TRUE

This toy example is a little unrealistic. It is often the case, at least for the work that I do, that the parallelized function either (i) uses an R library that isn't loaded at startup by deafault, e.g. the Matrix library for sparse matrices, or (ii) needs to access an object in the global environment, e.g. a variable.

The magic of `mclapply()`

is that uses fork to replicate the R process into several child processes, tells the children to do the work, and then aggregates the children's results for you. Since it uses forking, the entire R session -- all of its variables, functions, and packages -- is replicated among the children. Therefore, you can do things like this:

## Setup a global variable that uses a non-base package require(Matrix) ( a.global.variable <- Diagonal(3) ) ## 3 x 3 diagonal matrix of class "ddiMatrix" ## [,1] [,2] [,3] ## [1,] 1 . . ## [2,] . 1 . ## [3,] . . 1 ## Write a proof-of-concept lapply serial.output <- lapply( 1:4, function(xx) { return( wait.then.square(xx) + a.global.variable ) }) ## Parallelize it par.output <- mclapply( 1:4, function(xx) { return( wait.then.square(xx) + a.global.variable ) }, mc.cores=4) ## Check that they are equal all.equal(serial.output, par.output) ## [1] TRUE

It is, at least to me, a little magical! I don't have to think much.

Windows doesn't fork. It is a limitation of the operating system that there is no easy way to replicate the parent R session to create new child R sessions that can do the work.

R gets around this by pretending that each core on the machine is an entirely separate machine. This makes the setup a little more involved because the user must:

- create a "cluster" of child processes,
- load the necessary R packages on the cluster,
- copy the necessary R objects to the cluster,
- distribute work to the cluster, and finally
- stop the cluster.

Recall that the setup of the example is as follows:

## Load packages require(parallel) require(Matrix) ## ## Define example function and the global variable wait.then.square <- function(xx){ # Wait for one second Sys.sleep(1); # Square the argument xx^2 } a.global.variable <- Diagonal(3)

and the serial version of the code is:

serial.output <- lapply( 1:4, function(xx) { return( wait.then.square(xx) + a.global.variable ) })

Parallelizing this code requires more setup with the "cluster" approach.

## Step 1: Create a cluster of child processes cl <- makeCluster( 4 ) ## Step 2: Load the necessary R package(s) ## N.B. length(cl) is the number of child processes ## in the cluster par.setup <- parLapply( cl, 1:length(cl), function(xx) { require(Matrix) }) ## Step 3: Distribute the necessary R objects clusterExport( cl, c('wait.then.square', 'a.global.variable') ) ## Step 4: Do the computation par.output <- parLapply(cl, 1:4, function(xx) { return( wait.then.square(xx) + a.global.variable ) }) ## Step 5: Remember to stop the cluster! stopCluster(cl) ## Check that the parallel and serial output are the same all.equal(serial.output, par.output) ## [1] TRUE

This approach works on Windows, Linux, and Mac, but it requires a bit more bookkeeping.

Even though Windows doesn't fork, I'd like to pretend that it does so that I can use the simpler syntax of `mclapply()`

. My approach is to wrap the bookkeeping code for `parLapply()`

into a single function: `mclapply.hack()`

.

This is likely a bad idea for general use. Creating and destroying clusters for every `mclapply.hack()`

call defeats the advantages of having a persistent cluster to farm out work to. Copying every R object from the parent session to all of the cluster sessions takes up much more memory (and time!) then simply forking processes. Use this approach with caution!

The final code is as follows.

mclapply.hack <- function(...) { ## Create a cluster ## ... How many workers do you need? ## ... N.B. list(...)[[1]] returns the first ## argument passed to the function. In ## this case it is the list to iterate over size.of.list <- length(list(...)[[1]]) cl <- makeCluster( min(size.of.list, detectCores()) ) ## Find out the names of the loaded packages loaded.package.names <- c( ## Base packages sessionInfo()$basePkgs, ## Additional packages names( sessionInfo()$otherPkgs )) ## N.B. tryCatch() allows us to properly shut down the ## cluster if an error in our code halts execution ## of the function. For details see: help(tryCatch) tryCatch( { ## Copy over all of the objects within scope to ## all clusters. ## ## The approach is as follows: Beginning with the ## current environment, copy over all objects within ## the environment to all clusters, and then repeat ## the process with the parent environment. ## this.env <- environment() while( identical( this.env, globalenv() ) == FALSE ) { clusterExport(cl, ls(all.names=TRUE, env=this.env), envir=this.env) this.env <- parent.env(environment()) } ## repeat for the global environment clusterExport(cl, ls(all.names=TRUE, env=globalenv()), envir=globalenv()) ## Load the libraries on all the clusters ## N.B. length(cl) returns the number of clusters parLapply( cl, 1:length(cl), function(xx){ lapply(loaded.package.names, function(yy) { ## N.B. the character.only option of ## require() allows you to give the ## name of a package as a string. require(yy , character.only=TRUE)}) }) ## Run the lapply in parallel return( parLapply( cl, ...) ) }, finally = { ## Stop the cluster stopCluster(cl) }) }

We can test it as follows:

system.time( serial.output <- lapply( 1:4, function(xx) { return( wait.then.square(xx) + a.global.variable ) })) ## user system elapsed ## 0.020 0.000 4.022 system.time( par.output <- mclapply.hack( 1:4, function(xx) { return( wait.then.square(xx) + a.global.variable ) })) ## user system elapsed ## 0.024 0.012 3.683 all.equal( serial.output, par.output ) ## [1] TRUE

In this case, it works, but we don't save much time because of the bookkeeping required to setup the cluster for `parLapply()`

. If we run a more intense function, say one that takes 10 seconds per iteration to run, then we can begin to see gains:

wait.longer.then.square <- function(xx){ ## Wait for ten seconds Sys.sleep(10); ## Square the argument xx^2 } system.time( serial.output <- lapply( 1:4, function(xx) { return( wait.longer.then.square(xx) + a.global.variable ) })) ## user system elapsed ## 0.020 0.000 40.059 system.time( par.output <- mclapply.hack( 1:4, function(xx) { return( wait.longer.then.square(xx) + a.global.variable ) })) ## user system elapsed ## 0.024 0.008 12.794 all.equal( serial.output, par.output ) ## [1] TRUE

My motivation for implementing `mclapply()`

on Windows is so that code I wrote on Linux will "just work" on Windows.

I wrote a quick script to implement `mclapply.hack()`

as `mclapply()`

on Windows machines and leave `mclapply()`

alone on Linux and Mac machines. The code is as follows:

## ## mclapply.hack.R ## ## Nathan VanHoudnos ## nathanvan AT northwestern FULL STOP edu ## July 14, 2014 ## ## A script to implement a hackish version of ## parallel:mclapply() on Windows machines. ## On Linux or Mac, the script has no effect ## beyond loading the parallel library. require(parallel) ## Define the hack mclapply.hack <- function(...) { ## Create a cluster size.of.list <- length(list(...)[[1]]) cl <- makeCluster( min(size.of.list, detectCores()) ) ## Find out the names of the loaded packages loaded.package.names <- c( ## Base packages sessionInfo()$basePkgs, ## Additional packages names( sessionInfo()$otherPkgs )) tryCatch( { ## Copy over all of the objects within scope to ## all clusters. this.env <- environment() while( identical( this.env, globalenv() ) == FALSE ) { clusterExport(cl, ls(all.names=TRUE, env=this.env), envir=this.env) this.env <- parent.env(environment()) } clusterExport(cl, ls(all.names=TRUE, env=globalenv()), envir=globalenv()) ## Load the libraries on all the clusters ## N.B. length(cl) returns the number of clusters parLapply( cl, 1:length(cl), function(xx){ lapply(loaded.package.names, function(yy) { require(yy , character.only=TRUE)}) }) ## Run the lapply in parallel return( parLapply( cl, ...) ) }, finally = { ## Stop the cluster stopCluster(cl) }) } ## Warn the user if they are using Windows if( Sys.info()[['sysname']] == 'Windows' ){ message(paste( "\n", " *** Microsoft Windows detected ***\n", " \n", " For technical reasons, the MS Windows version of mclapply()\n", " is implemented as a serial function instead of a parallel\n", " function.", " \n\n", " As a quick hack, we replace this serial version of mclapply()\n", " with a wrapper to parLapply() for this R session. Please see\n\n", " http://www.stat.cmu.edu/~nmv/2014/07/14/implementing-mclapply-on-windows \n\n", " for details.\n\n")) } ## If the OS is Windows, set mclapply to the ## the hackish version. Otherwise, leave the ## definition alone. mclapply <- switch( Sys.info()[['sysname']], Windows = {mclapply.hack}, Linux = {mclapply}, Darwin = {mclapply}) ## end mclapply.hack.R

I posted the script at http://www.stat.cmu.edu/~nmv/setup/mclapply.hack.R. You can use it with

source('http://www.stat.cmu.edu/~nmv/setup/mclapply.hack.R')

as shown in the beginning of the post.

I would be grateful for any comments or suggestions for improving it. If there is sufficient interest, I can wrap it into a simple R package.

]]>Now that I have had a chance to get my feet under me at Northwestern I decided to do some web updating. Hopefully my loyal readers like the new look.

]]>To support the chapter, I have put together an online supplement which gives a detailed walk-through of how to write a Metropolis-Hastings sampler for a simple psychometric model (in R, of course!). The table of contents is below:

- Post 1: A Bayesian 2PL model
- Post 2: Generating fake data
- Post 3: Setting up the sampler and visualizing its output
- Post 4: Sampling the person ability parameters
- Post 5: Refactoring Part I: a generic Metropolis-Hastings sampler
- Post 6: Refactoring Part II: a generic proposal function
- Post 7: Sampling the item parameters with generic functions
- Post 8: Sampling the variance of person ability with a Gibbs step
- Post 9: Tuning the complete sampler

I will continue to add to the online supplement over time. The next few posts will be:

- Post 10: Over dispersion and multi-core parallelism
- Post 11: Replacing R with C
- Post 12: Adaptive tuning of the Metropolis-Hastings proposals

I would be grateful for any feedback. Feel free to either leave it here or at the online supplement itself.

]]>In this post, I'll show you how to install ATLAS and OpenBLAS, demonstrate how you can switch between them, and let you pick which you would like to use based on benchmark results. Before we get started, one quick shout out to Felix Riedel: thanks for encouraging me to look at OpenBLAS instead of ATLAS in your comment on my previous post.

**Update for Mac OS X users:** Zachary Meyer's comment gives bare bones details for how to accomplish a similar BLAS switch. He has a few more details on his blog. Thanks Zachary!

**Update for R multicore users:** According to this comment and this comment, OpenBLAS does not play well with one of R's other multicore schemes. It appears to be a bug, so perhaps it will get fixed in the future. See the comment stream for further details.

**Update for the adventurous:** According to Joe Herman: "OpenBLAS isn't faster than ATLAS, but it is much easier to install OpenBLAS via apt-get than it is to compile ATLAS and R manually from source." See Joe's comment for details on the benefits of compiling ATLAS and R from scratch.

For Ubuntu, there are currently three different BLAS options that can be easily chosen: "libblas" the reference BLAS, "libatlas" the ATLAS BLAS, and "libopenblas" the OpenBLAS. Their package names are

$ apt-cache search libblas libblas-dev - Basic Linear Algebra Subroutines 3, static library libblas-doc - Basic Linear Algebra Subroutines 3, documentation libblas3gf - Basic Linear Algebra Reference implementations, shared library libatlas-base-dev - Automatically Tuned Linear Algebra Software, generic static libatlas3gf-base - Automatically Tuned Linear Algebra Software, generic shared libblas-test - Basic Linear Algebra Subroutines 3, testing programs libopenblas-base - Optimized BLAS (linear algebra) library based on GotoBLAS2 libopenblas-dev - Optimized BLAS (linear algebra) library based on GotoBLAS2

Since libblas already comes with Ubuntu, we only need to install the other two for our tests. (NOTE: In the following command, delete 'libatlas3gf-base' if you don't want to experiment with ATLAS.):

$ sudo apt-get install libopenblas-base libatlas3gf-base

Now we can switch between the different BLAS options that are installed:

$ sudo update-alternatives --config libblas.so.3gf There are 3 choices for the alternative libblas.so.3gf (providing /usr/lib/libblas.so.3gf). Selection Path Priority Status ------------------------------------------------------------ * 0 /usr/lib/openblas-base/libopenblas.so.0 40 auto mode 1 /usr/lib/atlas-base/atlas/libblas.so.3gf 35 manual mode 2 /usr/lib/libblas/libblas.so.3gf 10 manual mode 3 /usr/lib/openblas-base/libopenblas.so.0 40 manual mode Press enter to keep the current choice[*], or type selection number:

update-alternatives: error: no alternatives for libblas.so.3gf

Try

$ sudo update-alternatives --config libblas.so.3

instead. See the comments at the end of the post for further details.

From the selection menu, I picked 3, so it now shows that choice 3 (OpenBLAS) is selected:

$ sudo update-alternatives --config libblas.so.3gf There are 3 choices for the alternative libblas.so.3gf (providing /usr/lib/libblas.so.3gf). Selection Path Priority Status ------------------------------------------------------------ 0 /usr/lib/openblas-base/libopenblas.so.0 40 auto mode 1 /usr/lib/atlas-base/atlas/libblas.so.3gf 35 manual mode 2 /usr/lib/libblas/libblas.so.3gf 10 manual mode * 3 /usr/lib/openblas-base/libopenblas.so.0 40 manual mode

And we can pull the same trick to choose between LAPACK implementations. From the output we can see that OpenBLAS does not provide a new LAPACK implementation, but ATLAS does:

$ sudo update-alternatives --config liblapack.so.3gf There are 2 choices for the alternative liblapack.so.3gf (providing /usr/lib/liblapack.so.3gf). Selection Path Priority Status ------------------------------------------------------------ * 0 /usr/lib/atlas-base/atlas/liblapack.so.3gf 35 auto mode 1 /usr/lib/atlas-base/atlas/liblapack.so.3gf 35 manual mode 2 /usr/lib/lapack/liblapack.so.3gf 10 manual mode

So we will do nothing in this case, since OpenBLAS is supposed to use the reference implementation (which is already selected).

update-alternatives: error: no alternatives for liblapack.so.3gf

Try

$ sudo update-alternatives –config liblapack.so.3

instead. See the comments at the end of the post for further details.

Now we can check that everything is working by starting R in a new terminal:

$ R R version 3.0.1 (2013-05-16) -- "Good Sport" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) ...snip... Type 'q()' to quit R. >

Great. Let's see if R is using the BLAS and LAPACK libraries we selected. To do so, we open another terminal so that we can run a few more shell commands. First, we find the PID of the R process we just started. Your output will look something like this:

$ ps aux | grep exec/R 1000 18065 0.4 1.0 200204 87568 pts/1 Sl+ 09:00 0:00 /usr/lib/R/bin/exec/R root 19250 0.0 0.0 9396 916 pts/0 S+ 09:03 0:00 grep --color=auto exec/R

The PID is the second number on the '/usr/lib/R/bin/exec/R' line. To see

which BLAS and LAPACK libraries are loaded with that R session, we use the "list open files" command:

$ lsof -p 18065 | grep 'blas\|lapack' R 18065 nathanvan mem REG 8,1 9342808 12857980 /usr/lib/lapack/liblapack.so.3gf.0 R 18065 nathanvan mem REG 8,1 19493200 13640678 /usr/lib/openblas-base/libopenblas.so.0

As expected, the R session is using the reference LAPACK (/usr/lib/lapack/liblapack.so.3gf.0) and OpenBLAS (/usr/lib/openblas-base/libopenblas.so.0)

I used Simon Urbanek's most recent benchmark script. To follow along, first download it to your current working directory:

$ curl http://r.research.att.com/benchmarks/R-benchmark-25.R -O

And then run it:

$ cat R-benchmark-25.R | time R --slave Loading required package: Matrix Loading required package: lattice Loading required package: SuppDists Warning message: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘SuppDists’ ...snip...

Ooops. I don't have the SuppDists package installed. I can easily load it via Michael Rutter's ubuntu PPA:

$ sudo apt-get install r-cran-suppdists

Now Simon's script works wonderfully. Full output

$ cat R-benchmark-25.R | time R --slave Loading required package: Matrix Loading required package: lattice Loading required package: SuppDists Warning messages: 1: In remove("a", "b") : object 'a' not found 2: In remove("a", "b") : object 'b' not found R Benchmark 2.5 =============== Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 1.36566666666667 2400x2400 normal distributed random matrix ^1000____ (sec): 0.959 Sorting of 7,000,000 random values__________________ (sec): 1.061 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 1.777 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 1.00866666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.13484335940626 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 0.566999999999998 Eigenvalues of a 640x640 random matrix______________ (sec): 1.379 Determinant of a 2500x2500 random matrix____________ (sec): 1.69 Cholesky decomposition of a 3000x3000 matrix________ (sec): 1.51366666666667 Inverse of a 1600x1600 random matrix________________ (sec): 1.40766666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.43229160585452 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 1.10533333333333 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 1.169 Grand common divisors of 400,000 pairs (recursion)__ (sec): 2.267 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 1.213 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 1.32600000000001 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.23425893178325 Total time for all 15 tests_________________________ (sec): 19.809 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.26122106386747 --- End of test --- 134.75user 16.06system 1:50.08elapsed 137%CPU (0avgtext+0avgdata 1949744maxresident)k 448inputs+0outputs (3major+1265968minor)pagefaults 0swaps

Where the elapsed time at the very bottom is the part that we care about. With OpenBLAS and the reference LAPACK, the script took 1 minute and 50 seconds to run. By changing around the selections with update-alternatives, we can test out R with ATLAS (3:21) or R with the reference BLAS (9:13). For my machine, OpenBLAS is a clear winner.

Give it a shot yourself. If you find something different, let me know.

]]>**Quick tricks for faster R code: Profiling to Parallelism**

*Abstract:*

I will present a grab bag of tricks to speed up your R code. Topics will include: installing an optimized BLAS, how to profile your R code to find which parts are slow, replacing slow code with inline C/C++, and running code in parallel on multiple cores. My running example will be fitting a 2PL IRT model with a hand coded MCMC sampler. The idea is to start with naive, pedagogically clear code and end up with fast, production quality code.

The slides are here. Code is here.

This was an informal talk. If you would like to dig into these topics more, some more references:

- Any of Dirk Eddelbuettel's talks, especially:
- Introduction to High-Performance Computing with R (Essentially a three-hour version of the Stat Bytes talk, but done much better! FWIW I found it after I gave the talk...)
- Rcpp by Examples (More info about Rcpp)
- RcppArmadillo: Accelerating R with C++ Linear Algebra (More info about C++ matrix stuff)

- CRAN Task View: High-Performance and Parallel Computing with R
- This gives an up to date overview about all things HPC and R. Give it a read to figure out what is happening.
- Note that this CRAN view is curated by by Dr. Eddelbuettel.

Update: 6/25/2013 For the Windows users out there, Felix Reidel has some notes about upgrading your BLAS. It is easier than I thought!

Update: 7/9/2013 Felix pointed out that OpenBLAS is faster than ATLAS. He is right. See my new blog post for details and proof.

]]>Monday, June 10, 2013

Noon - 1:30 PM

Room 237, Hamburg Hall

Title: On Correcting a Significance Test for Model Misspecification**

* The Heinz Second Paper (HSP) is a PhD qualifier for public policy students. Since I am in the joint Statistics and Public Policy program, mine is mix of math and policy.

** Contact me for a copy of the paper or slides.

Abstract:

Learning about whether interventions improve student learning is sometimes more complicated than it needs to be because of errors in the specification of statistical models for the analysis of educational intervention data. Recently, a series of papers in the education research literature (Hedges, 2007a, 2009; Hedges and Rhoads, 2011) have derived post-hoc corrections to misspecified test statistics so that the corrected versions can be used in a meta-analysis. However, these corrections are currently limited to special cases of simple models.

The purpose of this paper is to extend these corrections to models that include covariates and more general random effect structures. We develop a sufficient condition such that the distribution of the corrected test statistic asymptotically converges to the distribution of the standard statistical test that accounts for random effects, and we examine the finite sample performance of these approximations using simulation and real data from the Tennessee STAR experiment (Word et al., 1990). The What Works Clearinghouse, a division of the US Department of Education that rates the quality of educational interventions, has a policy that applies a simplified version of the Hedges (2007a) correction to any study which randomized by group but does not account for the group membership in the original analysis. We discuss the implications of this policy in practice.

]]>