helpstart()
to bring up help in a browser; the link to "Search Engine and Keywords" is most useful.
plot(density(apply(matrix(rnorm(40),20), 1, prod))) tmp = rnorm(40) # a vector of 40 standard normal variates tmp matrix(tmp,20) # put into a matrix of 20 rows and 40/20=2 columns apply(matrix(tmp,20), 1, prod) # the 20 products density(apply(matrix(tmp,20), 1, prod)) # the density estimate plot(density(apply(matrix(tmp,20), 1, prod))) # the plot
.First = function() {library(nlme); options(locatorBell=FALSE)}will load the nlme library every time R starts up in the current directory. It also turns off the annoying sound associated with the locator() function.
sizes = function() { ob = objects(name=parent.frame()) rslt = sapply(ob,function(x){object.size(get(x))}) return(sort(rslt)) }
x = factor(rep(LETTERS[1:3], each=20)); y = rnorm(60) m1 = aov(y~x) library(gmodels) cont = rbind(AvsBC = c(1, -1/2, -1/2), BvsC = c(0, 1, -1)) fit.contrast(m1, "x", cont, conf.int=0.95)
install.packages("mice", "c:\\myPackages")Then each R session use
library(mice, lib.loc="c:\\myPackages")
Here is an idea for making R code that stores comments and results in a separate, readable file. This is especially nice when you might need to re-source() your code due to changes in the data or analysis (i.e., essentially always). Optionally, you can run reportLatex() after all of your report() commands to create a .tex file that is formatted better and incorporates graphical output (see below).
(An alternative is sweave. Unlike sweave, report does not require you to understand latex, and it has only a single command to learn.)
The code and more documentation are at report.R.
Put these two lines near the top of your code:
if (!exists("report")) source("http://www.stat.cmu.edu/~hseltman/files/report.R") report("Start of my report on project X", new=TRUE, prefix="myProjectX")This creates a file named "myProjectXYYYY-MM-DD.txt" with the quoted string in the first argument as the text at the top of the file. You can include "\n" in the first argument to write multiple lines in one call. You can optionally add the argument useTime=TRUE to include the time of creation along with the date in the file name if you want to keep multiple versions from the same day.
Note that the variable "reportFileName" is created in your global environment and you should not delete this variable, at least while you are working on any one report.
Now anytime in your code, you can include code of the form
report(x)or
report(x, ..., z)to cause the value of x (or all of the variables x through z) to go to both the screen and the report file. This constructs the report on-the-fly as you work through your analysis. (If multiple arguments are used with report() and they are all strings or single numbers, then they are pasted together without any spaces between them (i.e., using sep="")).
Note that you can manually erase errors from the report file using a text editor.
Note that with a little planning, you will be in the situation such that if you re-source() your whole .R file, e.g., after correcting an error in the data or analysis, you will end up with a brand new, complete, readable report of the entire analysis with no effort.
Note that the screen width affects the output by controlling the usual R text wrapping, e.g., with table(). Normally, you will want to keep the screen width around 60-70 characters to make it easier to read the report.
Note that whenever you run report(x, new=TRUE, ...), if the report file name matches an existing file, the old file is deleted.
Here are some examples that demonstrate what you can do:
report("\nDemographics") report(table(age, gender)) report(paste("Number of visits =", nrow(dat))) report("\nSuccess by treatment") report(with(dat, table(success, treatment, exclude=NULL))) report("\nYears of education:") report(summary(demog$educ)) report(paste("\nDroppping", sum(noVisits|oneVisit), "subjects with no CERAD's or only 1 visit")) report(expression(str(my.data.frame)))The last example uses "expression()" because the "str()" function breaks the usual R rules and uses "cat()" rather than returning its result as an object. The "stem()" function is another example.
There three helper functions in report.R.
Note that pct() and total() can both be used on the same table, in either order. Each respects the results of the other to avoid the incorrect and/or confusing output that could result from, e.g., including data and their total when computing percents.
Optionally, you can use reportLatex() (code and description in reportLatex.R) to convert your .txt file into a .tex (Latex) file. This can incorporate plots as follows: when you are going through your analysis use the report text "See ... in myPlotFile.pdf", e.g.,
plot(rnorm(20, type="b", main="Random normals", xlab="time", ylab="x") fname = "rnorm.pdf" dev.copy(pdf, fname); dev.off() # Important: Be sure to put a blank between "in" and the end quote # since sep="" will be in effect. report("\nSee 20 Gaussians in ", fname)With or without these special graphics commands, when you run reportLatex() a .tex file is created with the same base name as your .txt report file. Note that you can manually edit the .tex file at this point if desired.
You then process this .tex file with pdflatex myReportFile.tex in Linux (or however else you know to process Latex files on any operating system) to produce the .pdf report file.
If you used the special graphics indicator text "See ... in someFileName.pdf", then the plots will be included in the report, and the caption of the figure will be the text between "See" and "in". Also the caption will include figure numbers starting at "Figure 1".
If you prefer to use a different graphics file type than "pdf" (as long as it is compatible with whatever version on pdflatex or latex that you are using) just run the optional form, e.g., reportLatex(extension=".pdf") substituting your graphics extension for "pdf".
myfun = function(dtf, name, p=0.5) { if (is.matrix(dtf)) dtf = data.frame(dtf) if (!is.data.frame(dtf)) stop("dtf must be a data.frame or matrix") if (!is.character(name) || length(name)!=1) stop("name must be a single character string") if (p<=0 || p>=1) stop("p must be in the interval (0,1)" ... return(rslt) }
myfun = function() { if (file.exists("myresults.dat")) { ...load and use old results... } ... for (i in 1:10000) { if (file.exists("stop")) { write.table(myresults, file="myresults.dat") stop("Early stop due to detection of stop file") } ... } ... return(...) }Then, you can create a file called "stop" at any time (e.g., in Linux using "echo stop>stop" at the Linux prompt) and the function will gracefully stop at the start of the next loop iteration. Without too much work, you can probably set up your function to automatically continue wherever you left off. Just remember to delete or rename the "stop" file before running the function again.
with(mydtf, plot(x, y, col=gender))where the columns of "mydtf" are x, y, and gender.
options(locatorBell=FALSE)
myfun = function(x) { # Argument x should contain one row, columns a,b,e,f. # The result is the mean of a and f unless b is missing or negative, # in which case the min of e and f is returned. if (is.na(x[2]) || x[2]<=0) { return(min(c(x[3],x[4]))) } else { return((x[1]+x[4])/2) } } dtf$new = apply(dtf[,c("a","b","e","f")], 1, myfun)An alternative is as follows. The optional first line may prevent some wonky errors, and is good practice.
dtf$new = NA # in general, NA is safer than 0, protecting against bad logic Sel = is.na(dtf$b) | dtf$b<=0 dtf[Sel, "new"] = pmin(dtf$e[Sel], dtf$f[Sel]) Sel = !is.na(dtf$b) & dtf$b>0 dtf[Sel, "new"] = (dtf$a[Sel]+dtf$f[Sel])/2
median(incdata[incdata$sex=="female", "income"])
calculates
the median income of just the female subjects in data.frame "incdata".
But expressions for each of several categories is awkward and inefficient. So the methods below present efficient alternatives. If no subsetting variable exists, consider using the R function cut() or Problem/Solution #2 to create it.
Here is code you can paste into R to generate a sample data.frame to use as an example:
n = 20 incdata = data.frame(sex=c("male","female")[1+rbinom(n,1,0.5)], race=c("black","white","hispanic","Asian")[1+rbinom(n,3,0.5)], income=round(rnorm(n,50000,15000)), networth=pmax(0,round(rnorm(n,50000,30000))))
aggregate(incdata[,c("income","networth")], by=list(gender=incdata$sex,race=incdata$race), FUN=median)To get the results of a two-way aggregate into a table form, see aggregate.table() in the gdata package.
with(incdata, split(income, race)) split(incdata[,3:4], list(gender=incdata$sex, race=incdata$race))Two things to note that differ from using aggregate are that list() is not needed when a single split variable is used, and that the result includes empty elements for categories with no data. The second step is to use lapply() or sapply() on the split() list to carry out some function of the data, either a built-in function, a function that you have created, on an on-the-fly "anonymous" function. The difference between lapply() and sapply() is the latter will return a simple, non-list object, if possible. This example calculates the mean of the ratio of income to net worth for each sex/race subgroup using an anonymous function on the two-column elements of the list:
tmp = split(incdata[,3:4], list(gender=incdata$sex, race=incdata$race)) lapply(tmp, function(x) mean(x[,1]/x[,2])) sapply(tmp, function(x) mean(x[,1]/x[,2]))This example calculates range of income for each race:
with(incdata, sapply(split(income, race), function(x) diff(range(x))))The simpler version uses tapply():
with(incdata, tapply(income, race, function(x) diff(range(x))))
dat = c("category size type", "abc 23.4 g17a72", "aaa 3.2 h19h33", "bar 17 z12z12") dtf = read.table(textConnection(dat), header=TRUE)
hash(table(sample(LETTERS[1:5],10,rep=TRUE)))
gives a result like:
# A B C D E # 3 2 2 2 1
\input{filename}
. The function has extensive support for
custom control of bolding, cell justification, double lining, decimal places, and misc. adornments (e.g., color, italics, etc.).
This function handles arbitrarily complex models, and plots all (or some selected) combinations of any one, two, or three variables in the model, fixing all other variables at default or user selected values. The plotted means and 95% confidence intervals (actualy, +/-2SE) are the same as you would get from writing (many) SAS ESTIMATE statements, but especially for models with higher order interaction and/or missing level combinations, this is extremely tedious in SAS. Note that many different useful plots can be produced from one SAS model run, by changing the R function's 'xVar', 'lineVar', and 'thirdVar' parameters, as well as the 'xLevels', 'lineLevels' and/or 'thirdLevels'
This function produces useful plots just by specifying the two files containing the SAS output and the names(s) of the variables to plot. By setting additional function parameters, you have close control of the final plot details, including reordering, subsetting, and/or relabeling factor levels; relabeling the variable names; changing the title, and y-axis label; supressing or keeping a subtitle which automatically shows all of the fixed values; changing font sizes, line colors, point characters, and line types, changing the location of the legends for the second (by color) and third (by line type) variables; keeping or suppressing the mean point markers; setting the width of the caps on the ends of the confidence bands, setting the y-axis value limits; and controlling the spread of the means of the levels of the second (line) and/or third variable around the nominal x-value for the first variable.
For safety sake, the function writes fixed values that it chooses to the screen, as well as table(s) of levels vs. labels if the function thinks they may have been incorrectly set (i.e., labels do not appear similar to levels). The latter output can be suppressed with 'silent=TRUE'. Finally the whole plot can be suppressed with 'plot=FALSE' if you only want the return value of the means, standard errors, and upper and lower 95% confidence values.
If the csv versions of the SAS ODS tables of coefficients and the variance-covariance matrix are named "coef.csv" and "covb.csv" you don't even need to supply those values to the function. And if no 'xVar', 'lineVar', and 'thirdVar' values are supplied, the first highest order plot available is produced by default.
Categorical variables are fixed at their first values or user-specified values. Each quantitative covariate requires a fixed value in the 'fix' list or a set of values in the appropriate 'xLevels', 'lineLevels' and/or 'thirdLevels' parameters. As a convenience, 'xLabels', 'lineLabels' and/or 'thirdLabels' may be specified without the corresponding 'Levels' parameter if you just want to subset or reorder the levels, or even make simple (unambigous) clearer labels.
A full example, including the SAS code to generate the csv files is found in SASIAPlotTest.R. Although the original data needed to run the SAS code is not provided, the SAS output files needed to test the R function are supplied.
Note that, because of a missing feature from PROC GLM, an extra step (creation of the ODS FIT table) is required. Therefore it may be easier to just fit the model with PROC MIXED without any random effects.
The basic idea is that you can flexibly specify a grid of buttons and sliders (possibly in groups)
on some or all of a graphic window, then run a loop that receives information on what the user
clicked so that you can execute any code you like, then loop back for more user input (repeatedly
until the user uses the graph window 'Stop' button or your own stop button).
The two basic modes of use are a menu in the margin of a plot for interactive plot control and
a full screen menu with actions occurring in the console window (remember to use flush.console()
in your code) or in one or more additional graphics windows.
rube
, is a Really Useful WinBUGS Enhancer. It makes working with
WinBUGS much easier. It is (currently) built on top of the R2WinBUGS package.
R2WinBUGS takes a good first step to allow you to
The rube
package is a wrapper for R2WinBUGS, but also much more. If you use it,
you don't need to learn the details of R2WinBUGS.
Note that when you install rube
, it will install R2WinBUGS, if that is not already on
your system. But you must install WinBUGS itself before using rube
. Also note that you
must upgrade R to at least version 2.10.0 before installing rube
. The rube
package only runs on a Windows operating system, because WinBUGS only runs on Windows.
(Future support for JAGS is planned.)
To install rube
put rube_0.3-6.zip
anywhere on your computer, then use the R menu option Packages/InstallFromZipFile. Like any
package, you install it once, and then you need to use library(rube)
each time you
start R (or put the library() command into your .First function).
Here are the main features of rube
:
categorize=TRUE
, a detailed summary of the above-mentioned relationships is produced.
VAR=value
to set default
values. This is most useful for hyperparameters. The two advantages are that
it make the code easier to read (it becomes more obvious where you chose arbitrary
hyperparameter values), and it makes hyperparameter sensitivity testing easy
because you can add simple run-time options to override any or all defaults without
modifying your code. The utility function showDefaults()
can be
used to list all of the defaults in a model. As an example,
Y[i] ~ dnorm(0, Y.PREC=100)will be interpreted as
Y[i] ~ dnorm(0, 100)if nothing special is done, but as
Y[i] ~ dnorm(0, 1000)if
subs=list(Y.PREC=1000)
is
used as a run-time function option.
LC(prefix, FORMULA, suffix, index)
anywhere in
your code. Here prefix, suffix, and index are "hard-coded" in your model,
but FORMULA is specified at run-time (with no default), and FORMULA is
just like the right-hand-side of an R model formula (with some minor restrictions).
The formula will include the covariates of interest for a particular WinBUGS
model run, and the prefix and suffix are used to automatically create
corresponding parameter names. An intercept is created unless -1 is included in the formula.
Usually you will want to append the
index to the covariates values. Here are a couple of examples:
mu[j] <- dnorm(LC(b,DEMOGS,,j), Y.PREC=100)will be interpreted as
mu[j] <- dnorm(beta0 + bOld*Old[j] + bMale*Male[j] + bOldMale*Old[j]*Male[j], 100)in various
rube
functions when you define varList=list(DEMOGS="Old*Male")
.
Then when you re-run your analysis with, say, varList=list(DEMOGS="Old*Male+White")
the appropriate covariate model is automatically used.
Your model can have any number of formulas, and the same name can be used in
multiple places, or any number of different names can be used, depending
on your model complexity. You can distinguish your parameters from the covariate
data values using the prefix, suffix, or both.
An example of a more complex covariate formula is "A+B+C+B:C+D*E*F+(G+H+I)^2"
which is shorthand for the model formula
A + B + C + B:C + D + E + F + D:E + D:F + E:F + G + H + I + G:H + G:I + H:I
Currently the model formulas do not allow "-" (except -1 is allowed) or "/", and inside the
(...)^n
syntax, "A:B" and "A*B" are treated as "A+B".
Warning: rube
does not recode your variables, so
the formula system will be inappropriate without warning for k-level categorical
variables that have not been manually recoded into k-1 indicator variables (or
other suitable coding). But the system does work for 0/1 or 1/-1 indicator
variables and quantitative covariates.
FOR(prefix, FORMULA, suffix, codeText)
. This syntax generates
one model code line for each term/parameter in the FORMULA by replacing a question mark (?)
in the "codeText" with the parameters one by one. Continuing with the above example,
we might have a model code line FOR(b,DEMOGS,, ? ~ dnorm(0, COV.PREC=100))to automatically generate
b0 ~ dnorm(0, 100) bOld ~ dnorm(0, 100) bMale ~ dnorm(0, 100) bOldMale ~ dnorm(0, 100) bWhite ~ dnorm(0, 100)when
varList=list(DEMOGS="Old*Male+White")
is defined.
If intercept b0
has a different prior, code it separately and
use varList=list(DEMOGS="Old*Male+White-1")
BASE(FORMULA,index)
which
can be useful for incoding a baseline covariate status. E.g.
isBase[i] <- BASE(DEMOGS,i)will be coded as
isBase[i] <- (1-Old[i])*(1-Male[i])*(1-White[i])when
varList=list(DEMOGS="Old*Male+White")
is defined.
In-line conditional coding takes the form
code1 IFCASE(caseFormula) code2 ELSECASE code3 ENDCASE code4
which results in
code1 code2 code4
if the caseFormula evaluates to TRUE and
code1 code3 code4
if the caseFormula evaluates to FALSE.
The code1
, ELSECASE code3
, and code4
elements are all optional. The "codeFormula" can be simple such as GAMMA
or complex such as GAMMA || (ALPHA && !INTERCEPT)
which will evaluate
to FALSE and TRUE respectively when the argument subs=c("ALPHA","AGE2")
is
included in the various rube
functions. This uses standard R evaluation including
OR (||), AND (&&), NOT (!) and parentheses. Each element in the "subs" vector
is defined TRUE and all missing elements are defined to be FALSE.
The multi-line form is:
IFCASE(GAMMA || (ALPHA && !INTERCEPT)) a <- b + c d <- e + f ELSEIFCASE(ALPHA) a <- b + c d <- e - f ELSECASE a <- b d <- f ENDCASEwhich evaluates to
a <- b + c d <- e + fwhen the argument
subs=c("ALPHA","AGE2")
is used, and
a <- b + c d <- e - fwhen the argument
subs=c("ALPHA","INTERCEPT")
is used.
Rube
provides a wrapper for the bugs()
function of R2WinBUGS,
called doBugs()
, which includes the ability to generate data on the fly and
to more flexible generate parameter starting values on the fly (see below), as well as
extensive abilities to
modify WinBUGS code on the fly (see above).
Here is an example: Rasch.R.
Rube
is very flexible in the form of the WinBUGS model specification.
It allows either a file with the specification or a string vector or a single vector
separated by newline (\n) characters. The latter is convenient for keeping your
model definition in the same file as your code, where you just define your model in this form:
myModel = "model { for (i in 1:N) { prec.y[i] <- pow(sig.y[i], -2) Y[i] ~ dnorm(x[i], prec[i].y) } }"To examine this very long string in R, use
showModel(myModel)
. To see
a particular version of the model based on evaluating any conditional code and/or
pseudo-functions, use a form like
showModel(myModel, subs=c("singleX","gamPrior"), varList=list(RHS1="A+B", RHS0="A+B-1"))Also note that
showModel()
with no arguments shows the most recently
used form run through doBugs()
.
rube
allows for passing arguments to this function.
This allows alternate methods of starting value generation in different situations,
possibly fully or partially data driven. In complex Bayesian problems, this can
be invaluable.
Rube
provides a function called p3()
which plots the trace, AR plot, and histogram for each monitored parameter, allowing
backing up as well as quitting before seeing all parameters
Rube
provides a function called priPost()
which is a prior/posterior
density plotter to compare the posterior density of various distributions to
their prior densities. This includes some "compound" prior distributions, e.g.,
a beta or gamma that has two gammas as its parameters.
rube
).
All links active 5/15/2014. Please report missing links, errors, and suggestions to