-------------------------------------------------------------------------------- A NOTE ON EVALUATION GRIDS: The code performs different aspects of the analysis of "grids" of different resolutions. Here, a grid refers to the a collection of evenly spaced (X,Y) pairs laid out over the region that the bivariate density estimate is sought. The finest grid is needed when calculating the least-squares cross- validation score. One of the arguments to the R function is a matrix whose entries are TRUE or FALSE depending on if that pair is in the observable see region; the details are given below. It is recommended that this grid be at least 500 by 500. A subset of points of this fine grid is used by the bulk of the alogrithm, including the fitting of local models. The argument reductfactor is the scaling used: For example, the default value is 5, which means that the coarser grid will be 100 by 100. This is reflected in the output of the function: The $bivest argument is the estimate on this coarser grid. -------------------------------------------------------------------------------- Compile the fotran subroutine using a Fortran 90 compiler. $ f90 -O5 -fpic -shared -o BivTrunc.so BivTrunc.f -llapack -------------------------------------------------------------------------------- Within R, execute the commands in the file BivTruncRcomm. This will load the necessary functions. $ R ... > source("BivTruncRcomm") -------------------------------------------------------------------------------- Run the command BivTrunc() to perform the analysis > results = BivTrunc(X,Y,xlim,ylim,mask,lambdax,lambday,tau,deg,datwght,verbose, grdsize,reductfactor) but note that all that is truly required is > results = BivTrunc(X,Y,xlim,ylim,mask,lambdax,lambday) The arguments are as follows: 1) "X": A vector containing the X values 2) "Y": A vector containing the Y values 3) "xlim": A vector of two values giving the lower and upper bound on the X values for the region over which the bivariate density is to be estimated. 4) "ylim": A vector of two values giving the lower and upper bound on the Y values for the region over which the bivariate density is to be estimated. For example, xlim=c(0,1) and ylim=c(0,1) if estimating over the unit square. 5) "mask": A matrix whose entries include are either TRUE or FALSE depending on if the grid point in this location is within the observable region (then mask[i,j]=TRUE), or not. The grid is assumed to be formed as follows: The X and Y sequences are > xseq = xlim[1] + (0:(res-1))/res*(xlim[2]-xlim[1]) + (xlim[2]-xlim[1])/(2*res) > yseq = ylim[1] + (0:(res-1))/res*(ylim[2]-ylim[1]) + (ylim[2]-ylim[1])/(2*res) Then mask[i,j] is TRUE if (xseq[i],yseq[j]) is in the observable region, and FALSE if not. The value of res should be large; it is recommended that it be 500. 6) "lambdax": The bandwidth(s) used in the X direction. If this is a single number, then that bandwidth is used for all local models, subject to adjustment by the nearest neighbor fraction "tau." IT IS ASSUMED THAT BANDWIDTHS ARE STATED ON THE SCALE OF THE DATA AFTER IT HAS BEEN STANDARDIZED TO LIE IN [0,1]. If this is a vector, it should be of dimension res/reductfactor by 1, since that is the number of local models in one dimension. This specifies the bandwidth for each local model. The first column gives the bandwidths for the X local models, the second column for the Y local models. If lambda=NULL, then the bandwidths are chosen solely using the nearest neighbor fraction. 7) "lambday": Same as lambdax, but for the Y direction. 7) "tau": Forces the neighborhoods to be large enough to include at least the proportion tau of data values. DEFAULT VALUE=0. 8) "deg": The degree to use for the local models. DEFAULT VALUE=1. 9) "datwght": The weight for each observation. Should be positive, and of the same length as X (and Y). If set to NULL, then weights are all one. DEFAULT VALUE = NULL. 10) "verbose": If TRUE, then the procedure outputs more diagnostic information as it runs. DEFAULT VALUE = FALSE. 11) "grdsize": The size of the grid on which the covariance matrix is estimated and returned. DEFAULT VALUE = 20. 12) "reductfactor": The relative size of each dimension of the fine to the coarse grid. THIS MUST BE ODD, AND EVENLY DIVIDE res. DEFAULT VALUE = 5. -------------------------------------------------------------------------------- When the analysis completes, the returned object includes the following attributes: 1) $bivest - The estimate of the bivariate density 2) $margx - The estimate of the X marginal 3) $margy - The estimate of the Y marginal 4) $xseqbivest - The vector of X values at which $bivest and $margx are provided. 5) $yseqbivest - The vector of Y values at which $bivest and $margx are provided. 6) $fits - The fitted values, i.e., the estimates of the bivariate density at each of the data values. 7) $lscv - The least squares cross validation score. 8) $grdcov - The estimate of the covariance matrix for the bivariate density estimate. The rows/columns are ordered ... 9) $xseqcov - The vector of X values at which $grdcov is evaluated. 10) $yseqcov - The vector of Y values at which $grdcov is evaluated. 11) $lambdaxused - The actual vector of smoothing parameters utilized in the X direction. 12) $lambdayused - The actual vector of smoothing parameters utilized in the Y direction. 13) $lvout - Each of the "leave-one-estimates," i.e. these are the estimates at each of the data pairs, if that observation were removed from the analysis. 14) $theta - The estimate of theta. 15) $setheta - Standard error of the estimate of theta.