barsP {BARS}R Documentation

Bayesian Adaptive Regression Splines

Description

Solves the Poisson nonparametric regression (curve-fitting) problem, where the data are Poisson counts with log mean f(x). It is assumed that the function f(x) may be approximated by a cubic spline. Utilizes reversible-jump MCMC on the sets of knots, allowing the spline to adapt to the data across the domain of x.

Usage

barsP(x,y,initial="logspline",iknots=3,prior="uniform",
priorparam=c(1,60),burnin=200,nsims=2000,tau=50.0,c=0.4,fits=TRUE,
peak=FALSE,conf=0.95,gridfit=FALSE,ngridpoints=150)

Arguments

Required: x,y
Optional:
Initial knots settings: initial, iknots
Settings on prior for knots: prior, priorparam
MCMC settings: burnin, nsims, tau, c
Output settings: fits, peak, conf
Other: gridfit, ngridpoints

x a vector of the independent variable, in increasing order
y a vector of the dependent variable, in the form of count data
initial use logspline ("logspline") or evenly spaced ("even" or "equal") for initial knots [default is "logspline"]
iknots the initial number of knots for the spline [default = 3]
prior the type of prior being used for the knots; the only acceptable answers are "poisson", "uniform", and "user" [default is "uniform"]
priorparam the parameter for the prior (See Details)
burnin the desired length of the burn-in for the MCMC chain [default = 200]
nsims the number of simulations (number of draws from the posterior) desired for the MCMC chain [default = 2000]
tau a parameter that controls the spread for the knot proposal distribution [default = 50.0]
c a paramater that controls the probability of birth and death candidates [default = 0.4]
fits a logical. The default 'TRUE' will cause the program to return the fitted values for each data point for each run of the simulation. Please note that if the number of data points and/or simulations is large, there may be a lengthy delay as the necessary data are read.
peak a logical. 'TRUE' will cause the program to return the location and height of the highest point on the fitted curve [default = FALSE]
conf for use with peak. Sets the probability for the credible intervals for the location and height of the peak [default = 0.95 for 95% credible intervals]
gridfit a logical. 'TRUE' will cause the program to return the posterior mean and mode fit at equally spaced grid points, the number of points being determined by the value of ngridpoints. [default = FALSE]
ngridpoints the number of equally spaced grid points at which the posterior mean and mode are evaluated. Used in conjuction with gridfit. [default = 150]

Details

Only x and y are required arguments. Unspecified arguments will use default values:

barsP(x, y)
barsP(x, y, initial="even", iknots=5)

For the parameter for the prior:

a. if using Poisson, the choice for lambda = mean

b. if using Uniform, a vector of length 2 which includes the minimum number of knots followed by the maximum number of knots [default = c(1,60)]

c. if using user-defined prior, a matrix with 2 columns. The first column should be the number of knots and the second column should be the probability of obtaining this number of knots. Note the following example:

2 0.05
3 0.15
4 0.30
5 0.30
6 0.10
7 0.10

Note that due to the large amount of output generated by the program, it is desirable to save the results of the program into a variable, such as:

out <- barsP(x,y,.....)

Value

a list containing the following: Always: postmeans, postmodes, simnum, nknots, sampknots, sampBICs, sampllikes, Optional, if fits = TRUE: sampfits Optional, if gridfit = TRUE: gridmeans, gridmodes Optional, if peak = TRUE: samplpeaks, samphpeaks, peaklocationquantile, peaklocationmean, peaklocationmode, peakheightquantile, peakheightmean, peakheightmode

postmeans vector of the posterior means of f(x) evaluated at the x values
postmodes vector of the posterior modes evaluated at the x values
simnum vector of each simulation number (the number of that draw from the posterior), beginning at burnin + 1 and ending at burnin + nsims
nknots vector of the number of knots used at each draw from the posterior (does not include burnin iterations)
sampknots matrix containing the position of the knots at each iteration. Length of the matrix is equal to the number of iterations, not including burnin iterations, with the width of the matrix equal to the maximum number of knots at any iteration. NAs are used to fill in the matrix at iteration numbers that have less than the maximum number of knots.
sampBICs vector of the calculated BIC at each draw from the posterior (does not include burnin iterations)
sampllikes vector of the calculated loglikelihood at each draw from the posterior (does not include burnin iterations)
sampfits optional, if fits='TRUE': matrix of fits for each draw from the posterior (does not include burnin iterations), with the rows of the matrix corresponding to the individual draw from the posterior. The columns represent the fits at each value of x.
gridmeans optional, if gridfit='TRUE': vector of the posterior means evaluated at equally spaced intervals with number of intervals determined by value of "ngridpoints"
gridmodes optional, if gridfit='TRUE': vector of the posterior modes evaluated at equally spaced intervals with number of intervals determined by value of "ngridpoints"
samplpeaks optional, if peaks='TRUE': vector of the x location of the highest point in the fitted curve for each draw from the posterior (does not include burnin iterations)
samphpeaks optional, if peaks='TRUE': vector of the y value (height) of the highest point in the fitted curve for each draw from the posterior (does not include burnin iterations)
peaklocationquantile optional, if peaks='TRUE': a credible interval for the x location of the highest peak; width of the interval is dependent upon the setting chosen for conf
peaklocationmean optional, if peaks='TRUE': the mean x location for the highest peak
peaklocationmode optional, if peaks='TRUE': the mode x location for the highest peak
peakheightquantile optional, if peaks='TRUE': a credible interval for the y value (height) of the heighest peak; width of the interval is dependent upon the setting chosen for conf
peakheightmean optional, if peaks='TRUE': the mean y value (height) of the heighest peak
peakheightmode optional, if peaks='TRUE': the mode y value (height) of the highest peak

Author(s)

Garrick Wallstrom

References

This function is designed to work with Wallstrom and Kass's Barslib, (C) 2003.

Barslib uses Hansen and Kooperberg's Logspline, (C) 1997, as the default method for selecting initial knots.

Barslib uses Bates and Venables Routines for manipulating B-splines, (C) 1998. The routines were released under the GNU General Public License version

see http://www.stat.cmu.edu/~kass/bars/bars.html

Examples

x<-c(1:110)
mu<-20*abs(exp(-x/100)*cos(x/5))
y<-rpois(110,mu)
out = barsP(x,y,prior="uniform",priorparam=c(1,15),fits=TRUE)

upper=apply(out$sampfits,2,quantile,0.975)
lower=apply(out$sampfits,2,quantile,0.025)

matplot(x,cbind(upper,out$postmeans,lower),lty=c(2,1,2),type='l')
points(x,y)

[Package BARS version 1.0-1 Index]