The reason I think this is such a useful program is that I am a great fan of C-within-S. Obviously Splus is a very powerful prototyping language. For production, we need to speed up critical sections of code, e.g. for loops. It is highly inefficient to code input/output, graphics, regression, functions like cov.mve(), etc. in C, when Splus does such a nice job. So Scompile comes to the rescue by compiling slow functions, and leaving the other stuff as S code. As a bonus, the prototype code can be used with little or no modification.
My sample program, a Metropolis-within-Gibbs approach to a cure-fraction survival model, ran about 12 times faster. Not counting the learning curve (see hints and quirks below), I guess the program modifications would take only 15 minutes. My from-scratch C program took weeks to write, and resulted in a 50-fold speed improvement.
Getting started at CMU
All of the down-loading and setup is already done for you; you only need
to (once) load the Splus function "Scompile" by typing
source("/usr/statlocal/notes/Scompile.q").
To compile your function(s), which live in, e.g. Test.q, type
Scompile("Test.q"). When the program finishes, it tells you
the name of an Splus file it created (Test.q.s in my example). This
name is needed only for using your functions in future Splus sessions.
In the current session you run the compiled functions exactly as you
did before you compiled them. It future Splus sessions, you must
source the Scompile ".s" file before running your functions
(source("Test.q.s") in my example).
Note that you can have several compiled functions in one or several files which are called one or several places in your uncompiled calling program, but you cannot call your uncompiled functions from a compiled function; nor can a compiled function call a function that lives in a different file and was compiled in a separate Scompile call.
Tips and Quirks
Scompile handles all matrix operations and standard Splus operations
and control structures (e.g. for loops). The Scompile web page lists the
functions that it can handle; many are missing. It is OK to re-compile
after changing your Splus source code; the new version overwrites the old.
It is also OK to re-source your Splus source code to return to using
uncompiled code (e.g. to allow use of the inspect() debugger).
Here are the things that I was affected by or just noticed:
mydnorm_function(x,mn,sd) {
return(exp(-(x-mn)^2/2/sd/sd)/sqrt(6.28318530718))
}
Specific examples of missing functions are:
mymax_function(x) {
m_x[1]
len_length(x)
if (len>1) {
for (i in 2:len) {
if (x[i]>m)
m_x[i]
}
}
return(m)
}
mymin_function(x) {
m_x[1]
len_length(x)
if (len>1) {
for (i in 2:length(x)) {
if (x[i]<m)
m_x[i]
}
}
return(m)
}
(You can remove the extra code to handle length-one vectors if you won't
pass any to these functions.)
mysort_function(dat) {
n_length(dat)
swp_T
while (swp) {
swp_F
for (i in 2:n) {
if (dat[i]<dat[i-1]) {
swp_T
tmp_dat[i]
dat[i]_dat[i-1]
dat[i-1]_tmp
}
}
}
return(dat)
}
f.oddity_function(data) {
tmp_sum(data)
return(tmp)
}
> Scompile("oddity.q")
> zig_c(1,2,3) # c() works
> f.oddity(zig)
[1] 6
> zig_1:3 # colon operator makes "bad" vector
> f.oddity(zig)
[1] 0
> zi_zig+0 # Just adding zero fixes the vector
> f.oddity(zig)
[1] 6
ll_function(mu,x) {
rtn_-0.5*(x-mu)^2
return(-sum(rtn))
}
We can compile this with Scompile("ll.q") and it works fine as long
as we call it with, e.g. ll(3.5, data).
But to minimize this function with nlmin(), we must modify ll()
to have a single argument (the parameter). If we rewrite as:
ll_function(mu) {
rtn_-0.5*(data-mu)^2
return(-sum(rtn))
}
we can use nlmin(ll,3.5) as long as the function is not compiled.
But, an attempt to compile gives:
> Scompile("ll.q")
[1] "Scompile ll.q"
/usr/local/bin/gcc -I /usr/include -o ll.q.o -c ll.q.c
ll.q.c: In function `ll':
ll.q.c:66: `data' undeclared (first use this function)
ll.q.c:66: (Each undeclared identifier is reported only once
ll.q.c:66: for each function it appears in.)
[1] "ll.q.s"
ld: Can't open ll.q.o
ld: No such file or directory
ld: Error 0
Error in dyn.load2("ll.q.o", userlibs = "-L/usr/st..: 'ld -A ...' failed
Dumped
The problem is that the variable "data" is unknown to the compiled function.
The solution is to "globalize" the data so the compiled function can see it.
Change the file "ll.q" to:
globalize_function(x) {
data <<- x
# follow the same pattern to globalize other variables
return(0)
}
ll_function(mu) {
rtn_-0.5*(data-mu)^2
return(-sum(rtn))
}
Then Scompile("ll.q") works, and we can find the MLE by first typing
globalize(data), and then doing nlmin(ll,3.5).In summary, the files that are compiled by Scompile must consist only of Splus functions; they can't define data. If you want a compiled function to access a variable that was initialized outside of the function, that variable must either be on the parameter list, or it must be "globalized" by passing it to a globalizing function that uses the "<<-" operator.
Note: When you leave Splus and return, in addition to sourcing "ll.q.s", you will need to type globalize(data) before you can proceed to use ll(mu) again.