Scompile Tips

This is a very cool program written by Matt Calder at Colorado State. From within Splus, you "compile" Splus functions stored in a file. You see the "C" compiler running, then, VOILA: when you call any functions in Splus from the file you compiled, they are run as C-within-S using dyn.load2().

The reason I think this is such a useful program is that I am a great fan of C-within-S. Obviously Splus is a very powerful prototyping language. For production, we need to speed up critical sections of code, e.g. for loops. It is highly inefficient to code input/output, graphics, regression, functions like cov.mve(), etc. in C, when Splus does such a nice job. So Scompile comes to the rescue by compiling slow functions, and leaving the other stuff as S code. As a bonus, the prototype code can be used with little or no modification.

My sample program, a Metropolis-within-Gibbs approach to a cure-fraction survival model, ran about 12 times faster. Not counting the learning curve (see hints and quirks below), I guess the program modifications would take only 15 minutes. My from-scratch C program took weeks to write, and resulted in a 50-fold speed improvement.

Getting started at CMU
All of the down-loading and setup is already done for you; you only need to (once) load the Splus function "Scompile" by typing source("/usr/statlocal/notes/Scompile.q"). To compile your function(s), which live in, e.g. Test.q, type Scompile("Test.q"). When the program finishes, it tells you the name of an Splus file it created (Test.q.s in my example). This name is needed only for using your functions in future Splus sessions. In the current session you run the compiled functions exactly as you did before you compiled them. It future Splus sessions, you must source the Scompile ".s" file before running your functions (source("Test.q.s") in my example).

Note that you can have several compiled functions in one or several files which are called one or several places in your uncompiled calling program, but you cannot call your uncompiled functions from a compiled function; nor can a compiled function call a function that lives in a different file and was compiled in a separate Scompile call.

Tips and Quirks
Scompile handles all matrix operations and standard Splus operations and control structures (e.g. for loops). The Scompile web page lists the functions that it can handle; many are missing. It is OK to re-compile after changing your Splus source code; the new version overwrites the old. It is also OK to re-source your Splus source code to return to using uncompiled code (e.g. to allow use of the inspect() debugger).

Here are the things that I was affected by or just noticed:

How to pass information to your function
Assume we want to find the MLE for some iid Gaussian data with known variance equal to one and unknown mean. Let data_rnorm(100,4), and use the normal minus log-likelihood function defined in the file "ll.q" as follows:
ll_function(mu,x) {
  rtn_-0.5*(x-mu)^2
  return(-sum(rtn))
}
We can compile this with Scompile("ll.q") and it works fine as long as we call it with, e.g. ll(3.5, data). But to minimize this function with nlmin(), we must modify ll() to have a single argument (the parameter). If we rewrite as:
ll_function(mu) {
  rtn_-0.5*(data-mu)^2
  return(-sum(rtn))
}
we can use nlmin(ll,3.5) as long as the function is not compiled. But, an attempt to compile gives:
> Scompile("ll.q")
[1] "Scompile ll.q"
/usr/local/bin/gcc -I /usr/include -o ll.q.o  -c ll.q.c
ll.q.c: In function `ll':
ll.q.c:66: `data' undeclared (first use this function)
ll.q.c:66: (Each undeclared identifier is reported only once
ll.q.c:66: for each function it appears in.)
[1] "ll.q.s"
ld: Can't open ll.q.o
ld: No such file or directory
ld: Error 0
Error in dyn.load2("ll.q.o", userlibs = "-L/usr/st..: 'ld -A ...' failed
Dumped
The problem is that the variable "data" is unknown to the compiled function. The solution is to "globalize" the data so the compiled function can see it.

Change the file "ll.q" to:

globalize_function(x) {
  data <<- x
  # follow the same pattern to globalize other variables
  return(0)
}

ll_function(mu) {
  rtn_-0.5*(data-mu)^2
  return(-sum(rtn))
}
Then Scompile("ll.q") works, and we can find the MLE by first typing globalize(data), and then doing nlmin(ll,3.5).

In summary, the files that are compiled by Scompile must consist only of Splus functions; they can't define data. If you want a compiled function to access a variable that was initialized outside of the function, that variable must either be on the parameter list, or it must be "globalized" by passing it to a globalizing function that uses the "<<-" operator.

Note: When you leave Splus and return, in addition to sourcing "ll.q.s", you will need to type globalize(data) before you can proceed to use ll(mu) again.


To my Home Page

To my Splus Page