helpstart() to bring up help in a browser; the link to "Search Engine and Keywords" is most useful.
plot(density(apply(matrix(rnorm(40),20), 1, prod)))
tmp = rnorm(40) # a vector of 40 standard normal variates
tmp
matrix(tmp,20) # put into a matrix of 20 rows and 40/20=2 columns
apply(matrix(tmp,20), 1, prod) # the 20 products
density(apply(matrix(tmp,20), 1, prod)) # the density estimate
plot(density(apply(matrix(tmp,20), 1, prod))) # the plot
.First = function() {library(nlme); options(locatorBell=FALSE)}
will load the nlme library every time R starts up in the current directory.
It also turns off the annoying sound associated with the locator() function.
sizes = function() {
ob = objects(name=parent.frame())
rslt = sapply(ob,function(x){object.size(get(x))})
return(sort(rslt))
}
x = factor(rep(LETTERS[1:3], each=20)); y = rnorm(60)
m1 = aov(y~x)
library(gmodels)
cont = rbind(AvsBC = c(1, -1/2, -1/2), BvsC = c(0, 1, -1))
fit.contrast(m1, "x", cont, conf.int=0.95)
install.packages("mice", "c:\\myPackages")
Then each R session use
library(mice, lib.loc="c:\\myPackages")
Here is an idea for making R code that stores comments and results in a separate, readable file. This is especially nice when you might need to re-source() your code due to changes in the data or analysis (i.e., essentially always).
The code and more documentation are at report.R.
Put these two lines near the top of your code:
if (!exists("report")) source("http://www.stat.cmu.edu/~hseltman/files/report.R")
report("Start of my report on project X", new=TRUE, prefix="myProjectX")
This creates a file named "myProjectXYYYY-MM-DD.txt" with the quoted
string in the first argument as the text at the top of the file. You can
include "\n" in the first argument to write multiple lines in one call.
You can optionally add the argument useTime=TRUE to include the time
of creation along with the date in the file name if you want to keep
multiple versions from the same day.Note that the variable "reportFileName" is created in your global environment and you should not delete this variable, at least while you are working on any one report.
Now anytime in your code, you can include code of the form
report(x)to cause the value of x to go to both the screen and the report file. With a little planning, you will be in the situation such that if you re-source() your whole file, you will end up with a nice, readable report of the entire analysis.
Note that the screen width affects the output by controlling the usual R text wrapping, e.g., with table().
Note that whenever you run report(x, new=TRUE, ...), if the report file name matches an existing file, the old file is deleted.
Here are some examples that demonstrate what you can do:
report("\nDemographics")
report(table(age, gender))
report(paste("Number of visits =", nrow(dat)))
report("\nSuccess by treatment")
report(with(dat, table(success, treatment, exclude=NULL)))
report("\nYears of education:")
report(summary(demog$educ))
report(paste("\nDroppping", sum(noVisits|oneVisit),
"subjects with no CERAD's or only 1 visit"))
report(expression(str(my.data.frame)))
The last example uses "expression()" because the "str()" function breaks
the usual R rules and uses "cat()" rather than returning its result
as an object. The "stem()" function is another example.There three helper functions in report.R.
myfun = function(dtf, name, p=0.5) {
if (is.matrix(dtf)) dtf = data.frame(dtf)
if (!is.data.frame(dtf)) stop("dtf must be a data.frame or matrix")
if (!is.character(name) || length(name)!=1) stop("name must be a single character string")
if (p<=0 || p>=1) stop("p must be in the interval (0,1)"
...
return(rslt)
}
myfun = function() {
if (file.exists("myresults.dat")) {
...load and use old results...
}
...
for (i in 1:10000) {
if (file.exists("stop")) {
write.table(myresults, file="myresults.dat")
stop("Early stop due to detection of stop file")
}
...
}
...
return(...)
}
Then, you can create a file called "stop" at any time (e.g., in Linux using "echo stop>stop" at
the Linux prompt) and the function will gracefully stop at the start of the next loop iteration.
Without too much work, you can probably set up your function to automatically continue wherever
you left off. Just remember to delete or rename the "stop" file before running the function
again.
with(mydtf, plot(x, y, col=gender))
where the columns of "mydtf" are x, y, and gender.
options(locatorBell=FALSE)
myfun = function(x) {
# Argument x should contain one row, columns a,b,e,f.
# The result is the mean of a and f unless b is missing or negative,
# in which case the min of e and f is returned.
if (is.na(x[2]) || x[2]<=0) {
return(min(c(x[3],x[4])))
} else {
return((x[1]+x[4])/2)
}
}
dtf$new = apply(dtf[,c("a","b","e","f")], 1, myfun)
An alternative is as follows. The optional first line may prevent some wonky errors, and is good practice.
dtf$new = NA # in general, NA is safer than 0, protecting against bad logic
Sel = is.na(dtf$b) | dtf$b<=0
dtf[Sel, "new"] = pmin(dtf$e[Sel], dtf$f[Sel])
Sel = !is.na(dtf$b) & dtf$b>0
dtf[Sel, "new"] = (dtf$a[Sel]+dtf$f[Sel])/2
median(incdata[incdata$sex=="female", "income"]) calculates
the median income of just the female subjects in data.frame "incdata".
But expressions for each of several categories is awkward and inefficient. So the methods below present efficient alternatives. If no subsetting variable exists, consider using the R function cut() or Problem/Solution #2 to create it.
Here is code you can paste into R to generate a sample data.frame to use as an example:
n = 20
incdata = data.frame(sex=c("male","female")[1+rbinom(n,1,0.5)],
race=c("black","white","hispanic","Asian")[1+rbinom(n,3,0.5)],
income=round(rnorm(n,50000,15000)),
networth=pmax(0,round(rnorm(n,50000,30000))))
aggregate(incdata[,c("income","networth")], by=list(gender=incdata$sex,race=incdata$race), FUN=median)
To get the results of a two-way aggregate into a table form, see aggregate.table() in the gdata package.
with(incdata, split(income, race))
split(incdata[,3:4], list(gender=incdata$sex, race=incdata$race))
Two things to note that differ from using aggregate are that list() is not needed
when a single split variable is used, and that the result includes empty elements
for categories with no data.
The second step is to use lapply() or sapply() on the split() list to carry out some
function of the data, either a built-in function, a function that you have created,
on an on-the-fly "anonymous" function. The difference between lapply() and sapply()
is the latter will return a simple, non-list object, if possible.
This example calculates the mean of the ratio of income to net worth for each sex/race subgroup
using an anonymous function on the two-column elements of the list:
tmp = split(incdata[,3:4], list(gender=incdata$sex, race=incdata$race))
lapply(tmp, function(x) mean(x[,1]/x[,2]))
sapply(tmp, function(x) mean(x[,1]/x[,2]))
This example calculates range of income for each race:
with(incdata, sapply(split(income, race), function(x) diff(range(x))))
The simpler version uses tapply():
with(incdata, tapply(income, race, function(x) diff(range(x))))
The basic idea is that you can flexibly specify a grid of buttons and sliders (possibly in groups)
on some or all of a graphic window, then run a loop that receives information on what the user
clicked so that you can execute any code you like, then loop back for more user input (repeatedly
until the user uses the graph window 'Stop' button or your own stop button.
The two basic modes of use are a menu in the margin of a plot for interactive plot control and
a full screen menu with actions occurring in the console window (remember to use flush.console()
in your code) or in one or more additional graphics windows.
rube, is a Really Useful WinBUGS Enhancer. It makes working with
WinBUGS much easier. It is (currently) built on top of the R2WinBUGS package.
R2WinBUGS takes a good first step to allow you to
The rube package is a wrapper for R2WinBUGS, but also much more. If you use it,
you don't need to learn the details of R2WinBUGS.
Note that when you install rube, it will install R2WinBUGS, if that is not already on
your system. But you must install WinBUGS itself before using rube. Also note that you
must upgrade R to at least version 2.10.0 before installing rube. The rube
package only runs on a Windows operating system, because WinBUGS only runs on Windows.
(Future support for JAGS is planned.)
To install rube put rube_0.1-0.zip
anywhere on your computer, then use the R menu option Packages/InstallFromZipFile. Like any
package, you install it once, and then you need to use library(rube) each time you
start R (or put the library() command into your .First function).
Here are the main features of rube:
categorize=TRUE, a detailed summary of the above-mentioned relationships is produced.
VAR=value to set default
values. This is most useful for hyperparameters. The two advantages are that
it make the code easier to read (it becomes more obvious where you chose arbitrary
hyperparameter values), and it makes hyperparameter sensitivity testing easy
because you can add simple run-time options to override any or all defaults without
modifying your code. The utility function showDefaults() can be
used to list all of the defaults in a model. As an example,
Y[i] ~ dnorm(0, Y.PREC=100)will be interpreted as
Y[i] ~ dnorm(0, 100)if nothing special is done, but as
Y[i] ~ dnorm(0, 1000)if
subs=list(Y.PREC=1000) is
used as a run-time function option.
LC(prefix, FORMULA, suffix, index) anywhere in
your code. Here prefix, suffix, and index are "hard-coded" in your model,
but FORMULA is specified at run-time (with no default), and FORMULA is
just like the right-hand-side of an R model formula (with some minor restrictions).
The formula will include the covariates of interest for a particular WinBUGS
model run, and the prefix and suffix are used to automatically create
corresponding parameter names. An intercept is created unless -1 is included in the formula.
Usually you will want to append the
index to the covariates values. Here are a couple of examples:
mu[j] <- dnorm(LC(b,DEMOGS,,j), Y.PREC=100)will be interpreted as
mu[j] <- dnorm(beta0 + bOld*Old[j] + bMale*Male[j] + bOldMale*Old[j]*Male[j], 100)in various
rube functions when you define varList=list(DEMOGS="Old*Male").
Then when you re-run your analysis with, say, varList=list(DEMOGS="Old*Male+White")
the appropriate covariate model is automatically used.
Your model can have any number of formulas, and the same name can be used in
multiple places, or any number of different names can be used, depending
on your model complexity. You can distinguish your parameters from the covariate
data values using the prefix, suffix, or both.
An example of a more complex covariate formula is "A+B+C+B:C+D*E*F+(G+H+I)^2"
which is shorthand for the model formula
A + B + C + B:C + D + E + F + D:E + D:F + E:F + G + H + I + G:H + G:I + H:I
Currently the model formulas do not allow "-" (except -1 is allowed) or "/", and inside the
(...)^n syntax, "A:B" and "A*B" are treated as "A+B".
Warning: rube does not recode your variables, so
the formula system will be inappropriate without warning for k-level categorical
variables that have not been manually recoded into k-1 indicator variables (or
other suitable coding). But the system does work for 0/1 or 1/-1 indicator
variables and quantitative covariates.
FOR(prefix, FORMULA, suffix, codeText). This syntax generates
one model code line for each term/parameter in the FORMULA by replacing a question mark (?)
in the "codeText" with the parameters one by one. Continuing with the above example,
we might have a model code line FOR(b,DEMOGS,, ? ~ dnorm(0, COV.PREC=100))to automatically generate
b0 ~ dnorm(0, 100)
bOld ~ dnorm(0, 100)
bMale ~ dnorm(0, 100)
bOldMale ~ dnorm(0, 100)
bWhite ~ dnorm(0, 100)
when varList=list(DEMOGS="Old*Male+White") is defined.
If intercept b0 has a different prior, code it separately and
use varList=list(DEMOGS="Old*Male+White-1")
BASE(FORMULA,index) which
can be useful for incoding a baseline covariate status. E.g.
isBase[i] <- BASE(DEMOGS,i)will be coded as
isBase[i] <- (1-Old[i])*(1-Male[i])*(1-White[i])when
varList=list(DEMOGS="Old*Male+White") is defined.
In-line conditional coding takes the form
code1 IFCASE(caseFormula) code2 ELSECASE code3 ENDCASE code4
which results in
code1 code2 code4 if the caseFormula evaluates to TRUE and
code1 code3 code4 if the caseFormula evaluates to FALSE.
The code1, ELSECASE code3, and code4
elements are all optional. The "codeFormula" can be simple such as GAMMA
or complex such as GAMMA || (ALPHA && !INTERCEPT) which will evaluate
to FALSE and TRUE respectively when the argument subs=c("ALPHA","AGE2") is
included in the various rube functions. This uses standard R evaluation including
OR (||), AND (&&), NOT (!) and parentheses. Each element in the "subs" vector
is defined TRUE and all missing elements are defined to be FALSE.
The multi-line form is:
IFCASE(GAMMA || (ALPHA && !INTERCEPT))
a <- b + c
d <- e + f
ELSEIFCASE(ALPHA)
a <- b + c
d <- e - f
ELSECASE
a <- b
d <- f
ENDCASE
which evaluates to
a <- b + c
d <- e + f
when the argument subs=c("ALPHA","AGE2") is used, and
a <- b + c
d <- e - f
when the argument subs=c("ALPHA","INTERCEPT") is used.
Rube provides a wrapper for the bugs() function of R2WinBUGS,
called doBugs(), which includes the ability to generate data on the fly and
to more flexible generate parameter starting values on the fly (see below), as well as
extensive abilities to
modify WinBUGS code on the fly (see above).
Here is an example: Rasch.R.
Rube is very flexible in the form of the WinBUGS model specification.
It allows either a file with the specification or a string vector or a single vector
separated by newline (\n) characters. The latter is convenient for keeping your
model definition in the same file as your code, where you just define your model in this form:
myModel = "model {
for (i in 1:N) {
prec.y[i] <- pow(sig.y[i], -2)
Y[i] ~ dnorm(x[i], prec[i].y)
}
}"
To examine this very long string in R, use showModel(myModel). To see
a particular version of the model based on evaluating any conditional code and/or
pseudo-functions, use a form like
showModel(myModel, subs=c("singleX","gamPrior"), varList=list(RHS1="A+B", RHS0="A+B-1"))
Also note that showModel() with no arguments shows the most recently
used form run through doBugs().
rube allows for passing arguments to this function.
This allows alternate methods of starting value generation in different situations,
possibly fully or partially data driven. In complex Bayesian problems, this can
be invaluable.
Rube provides a function called p3()
which plots the trace, AR plot, and histogram for each monitored parameter, allowing
backing up as well as quitting before seeing all parameters
Rube provides a function called priPost() which is a prior/posterior
density plotter to compare the posterior density of various distributions to
their prior densities. This includes some "compound" prior distributions, e.g.,
a beta or gamma that has two gammas as its parameters.
rube).
All links active 8/2/2005. Please report missing links, errors, and suggestions to
To my Home Page