My papers on spike train analysis can be put in 5 categories, listed immediately below. The context and content of these papers are explained in subsequent sections:
An overview of computational neuroscience may be found in this review article.
One of the most important techniques in learning about the functioning of healthy and diseased brains has involved examining neural activity in laboratory animals under varying experimental conditions. Neural information is represented and communicated through series of action potentials, or spike trains (Kass, 2014). Numerous investigations have documented the way neural firing rate increases in response to a sensory stimulus, or preceding an action.

The top two panels of Figure 1 display spike trains recorded simultaneously from two neurons in primary visual cortex during 64 replications, or trials, of a single experimental condition. In each of the lower panels of Figure 1 the data have been aggregated across trials into bins of width delta t = 5, and normalized to units of spikes per second, yielding a PeriStimulus Time Histogram (PSTH). This pair of PSTH plots shows a clear change in firing rate across the several hundred milliseconds following time zero, which refers to the onset of a visual stimulus.
Based on a large sample of data, the firing rate (FR) across a substantial interval of time could be defined as FR=(number of spikes)/(interval of time). However, it is desirable to avoid dependence on a specific time interval; furthermore, as may be seen in Figure 1, spikes occur irregularly both within and across repeated trials. It is therefore reasonable to think of a spike train as a stochastic sequence of isolated points in time, i.e., as a point process, and to introduce a theoretical, instantaneous firing rate in the form of the point process intensity function. Analyses of trialaveraged firing rate involve trialaggregrated data, where the trial identity of each spike time is ignored. In point process terminology, the trialaveraged firing rate is the marginal intensity function. The PSTH may be considered an estimate of the marginal intensity function. Analyses of withintrial firing rate, on the other hand, involve the original spike trains themselves, without aggregration, and are based on the conditional intensity function. Letting H_t denote the history of spike times prior to time t, the point process probability density function, and thus the likelihood function, may be written in terms of the conditional intensity function lambda(t  H_t), which is also the hazard function for the waiting time distribution (waiting for the next spike). On a given experimental trial the conditional intensity determines the conditional probability that the next spike will occur at time: P(spike in (t, t + dt) H_t)= lambda(t  H_t)dt. In the Poisson case the process is memoryless and the conditional intensity reduces to the historyindependent form lambda(t  H_t) = lambda(t). In general, for nonPoisson processes, we may consider the replicationaverage of the conditional intensity, which is the marginal intensity defined by the expectation lambda(t) = E(lambda (t  H_t)) so that the trialaveraged probability of a spike at time becomes

Spike times are recorded to fixed accuracy delta t. An observed spike train is thus a binary time series, as pictured in Figure 2. It is not hard to show that the point process likelihood function approximates the corresponding binary time series likelihood function, for small delta t. Typically delta t = 1 millisecond and the approximation is very good. Importantly, this means that if a point process model sets log lambda(t  H_t) equal to some function of covariates, the spike train data may then be analyzed using generalized linear models and familiar variants of them.
From 1998 to 2010 All of the foregoing was discussed by Brillinger, notably in a 1988 paper in Biological Cybernetics, but no new insights into neural behavior were offered and the main ideas did not begin to penetrate the neurophysiology literature until at least 10 years later. In 1998 Emery Brown and colleagues published a paper that introduced statespace modeling, in the context of point processes, to neuroscience. That year he and I decided to learn about existing approaches to spike train data analysis with the goal of writing a review article. As Emery and I dug into the literature we discovered gaping holes, holes so large that we could not write a review until they were filled. With various colleagues Emery and I, mostly separately, worked on several problems, as did some other statisticallyoriented researchers in neurophysiology. Finally, by 2005, together with Valerie Ventura, we were able to finish and publish the review article we had set out to write 7 years earlier (Kass, Ventura, Brown, 2005; see also Brown, Kass, and Mitra, 2004). Along the way, in 2002, Emery and I began a series of international workshops Statistical Analysis of Neural Data (SAND), bringing together experimenters, computational neuroscientists, and statisticians. It is impossible to say what impact these research and outreach efforts had on the field, but the quality of statistical analysis of spike train data improved dramatically over this period. The first SAND meeting devoted considerable time to problem definition. At the fifth SAND meeting, in 2010, investigators reported development and application of stateoftheart statistical methodology in attacking wellformulated and important scientific questions. (See Brown and Kass, 2007; Kass, 2010a.) Our experiences during this time prompted reflections on the nature of statistics and statistical training (Brown and Kass, 2009), and on the connections between statistical practice and the foundations of statistical inference (Kass, 2010b, 2011).
The large majority of published findings have been based on trialaveraged firing rates, often using elementary comparisons (e.g., ttests) of spike counts obtained over windows of time where a neuron appeared to be active. A comparatively sophisticated yet still simple example is in Suway et al. (2017). Early on, my Pittsburgh colleagues made me aware of two shortcomings in spike count comparisons: (1) they can not produce fine structure in trialaveraged rates, such as the time at which the maximal firing rate occurs, and (2) the window used in spike count analysis is arbitrary. We responded by adapting existing smoothing methods (Olson et al., 2000; Ventura et al., 2002) and developing powerful new smoothing methods (DiMatteo, Genovese, and Kass, 2001). The new methods were based on Bayesian Adaptive Regression Splines (BARS), which solves generalized nonparametric regression problems with regression splines by using reversiblejump MCMC to find good knot sets.
Our initial research showed that (i) after aggregating across trials the spike times may often be assumed to follow an inhomogeneous Poisson process, (ii) Poisson nonparametric regression splines could provide a useful methodology for answering scientific questions of the sort frequently found in neurophysiological investigations, and (iii) when fixedknot regression splines (or more generally, fixedbandwidth smoothers) were inadequate, BARS did an excellent job of estimating the marginal intensity function. Figure 1 illustrates such a situation, where a sudden increase in firing rate occurred. We documented the very good frequentist properties of BARS, both small mean squared error and accurate coverage probability of posterior intervals. We have yet to find any example where another nonparametric method can produce a smaller mean squared error. We provided a good implementation of BARS, especially in the Poisson case we cared most aboutbecause it applies to marginal intensity functionswhere we incorporated earlier stepwise knot selection code produced by Charles Kooperberg. The software was put into the CRAN library and was described in Wallstrom, Kass, and Liebner (2008). A Matlab version was also created by Ryan Kelly. Kass (2008) gives an overview for neuroscientists.
We then used BARS to (i) assess variability of marginal intensity functions across multiple neurons, (ii) test equality of two or more marginal intensity functions, and (iii) estimate directional tuning functions. Results from (i), reported in Behseta et al. (2005), showed that the proportion of variation explained by the first principal component based on näive functional data analysis (FDA) can be severely biased upwards because it ignores the variability in estimation of the functions. In a simulation study based on our earlier analysis of 347 neurons from primary motor cortex, the method was 8 times more accurate than näive FDA. Our results from (ii), in Behseta and Kass (2005) and Behseta et al. (2007), provided what we considered to be an elegant answer to the question, ``Which time window should I use to compare firing rates?'' In our methods we used the entire experimental time interval to compare the marginal intensity functions: we chose a grid of time points at which to evaluate the two BARS estimates, retained the BARS covariance matrices, and applied a modified MANOVA. The modifications dealt with covariance matrices that were known but unequal and singular, and in Behseta et al. (2007) we derived the appropriate likelihood ratio tests (building on results that had appeared in the statistics literature about 10 years earlier). The methods are general and, in fact, we have also applied them to a problem in MEG imaging (Wang et al., 2010, Journal of Neurophysiology). For (iii) Kaufman, Ventura, and Kass (2005) developed an extension of BARS for circular applications, called cBARS, using a periodic spline basis. We showed that cBARS can provide the same kind of improvements compared to circular smoothing splines that BARS provides in the usual settings.
Numerous papers in the literature have relied on the Poisson assumption, even though it has been known for a long time that spike trains are often discernably nonPoisson both in theory, according to stochastic differential equation models, and in practice. In parallel with our work on marginal intensities, for trialaveraged firing rates, we have developed methodology based on conditional intensity functions for nonPoisson withintrial analysis. This has several components. First, it is important to know how sensitive methods are to realistic departures from Poisson processes. Second, it is very helpful to have models that can accommodate nonPoisson point processes, where spiking history comes into play. Third, it is sometimes possible and desirable to characterize trialspecific effects, including sources of trialtotrial variability. This requires withintrial methodology.
In general lambda(t  H_t) could be a complicated function of the entire spiking history. Some assumption must be introduced to reduce the effective dimensionality of the problem. One approach, applied by Brown and colleagues in a 2001 paper, is to modulate the firing rate of a renewal process by rescaling time. The resulting models require specialpurpose nonlinear fitting. Kass and Ventura (2001) observed that models could be fitted with standard generalized linear models software when the dependence of firing rate on the time intervals between past spikes (spikes before time t) is described by splines. The GLMbased approach of Kass and Ventura (2001), including its many generalizations, has become standard. Because models of this type are based on a generalized regression framework, a descriptive term for them is point process regression models. Koyama and Kass (2008) provided a detailed comparison of the Brown et al. and Kass and Ventura approaches using simulated spike trains based on integrateandfire models, which are fundamental to theoretical neuroscience. The main results were that each of the two statistical models could fit slightly better than the other in particular situationsdepending on the source of variability in the neural input currentbut both generally fit this theoretical model far better than the inhomogeneous Poisson process. In Wang et al. (2015) we noted that in some slice experiments (ex vivo) standard logadditive nonPoisson models tend to underfit the peak firing rate as a function of stimulus, and we showed that an ad hoc fix can perform better (an elegant solution to this difficulty remains an outstanding problem as of 2016).
In addition to investigating nonPoisson models of spike train irregularity we have developed complementary methods of handling two other aspects of variability that are important for withintrial analyses. After Ventura (2004, Neural Computation ) had provided a simple approach to characterizing onset latency (roughly speaking, the time at which increased firing occurs), Ventura, Cai, and Kass (2005b) showed how trialtotrial gain modulation (amplitude variation in firing rate) could be accommodated in the generalized linear modeling framework by using lowdimensional splines to represent deviations from the trialaveraged rate. At the urging of several collaborators we also attacked the problem of burst detection. In many contexts a neuron will suddenly, and erratically, ``turn on'' and fire a rapid succession of spikes after which it will return to its resting firing rate. Bursting behavior has been studied theoretically using differential equations, but empirically bursts do not always occur as easilyidentified clusters of successive spikes and a small literature has been devoted to automatic burst detection. One initial thought might be to use a hidden Markov model to represent the unknown binary burst/nonburst state, but bursting tends to be more irregular than exponential waiting times allow. In Tokdar et al. (2009) we developed a hidden semiMarkov model (generalizing the waitingtime distribution) and a fast Gibbs sampling algorithm for fitting, and we showed that this new method could outperform existing approaches. A different, but related problem is identify oscillatory behavior in spike trains. In many contexts oscillations may be expected, but because of the erratic nature of neural spiking they can be hard to assess statistically. In Arai and Kass (2017) we develop methodology based on constrained autoregressive latent processes to pull out the oscillation, if it exists.
During an extended visit to the University of California, Berkeley, I began a collaboration with Vincent Vu, who was then a student working with Bin Yu. A tool favored by many physicsoriented theoretical neuroscientists is mutual information, but most databased calculations have used naive methods for estimation of information. In Vu et al. (2007) we showed how a method developed in ecology to estimate diversity in the presence of unseen species could be used for neural applications; we proved consistency and optimality of the approach; and we showed that it could behave better than existing alternatives in finite samples. A conceptual difficulty in using mutual information is its reliance on stationarity: in many cases the stimulus itself is already nonstationary, making the definition of mutual information void. In Vu et al (2009a) we discussed this problem in detail, and showed what happens, in large samples, to information estimators in the nonstationary case; we argued that such estimators may be useful even when mutual information is no longer well defined.
Many laboratories are now recording groups of neurons simultaneously, across multiple electrodes, which raises the tantalizing possibility of understanding the coordinated activity of neural networks. One of the issues in multineuron analysis involves the extent to which correlated activity contributes stimulusrelated information beyond that available by assuming the neurons act independently. This has generated considerable interest and debate. Correlations may occur at fine time scales of only a few milliseconds, or at much broader time scales of tens to hundreds of milliseconds. Fine time scale synchrony could produce in a downstream neuron a sharp increase in probability of spiking, which many have argued ought to play a fundamental role in perceptual information transfer. Statistically, assessing fine time scale synchrony is a delicate matter because synchronous events are relatively rare so that uncertainty may be substantial and small biases could have a dramatic effect. Broad time scale correlation, reflecting more erratic network bombardment of multiple neurons, would also be of interest, especially to the extent that it may contain stimulusrelated information. The challenge then is to describe the stimulusdependence of dynamic interactions among multiple nonstationary point processes.
A simple way to assess dependence between two neurons, at relatively coarse time scales, is to compute the correlation between their spike counts over some time window. Recent neurophysiological literature documented a tendency for this correlation to increase as the length of the time window increases. Kass and Ventura (2006) showed that this phenomenon is a necessary consequence of multiplicative trialtotrial variability. Not only did this result provide a new interpretation, and highlight the importance of specifying the time window in any discussion of correlation, but, because multiplicative models fit an important feature of spike train data, it also provides support for the use of loglinear models of intensity (having effects that are additive in the log probability scale). In Behseta et al. (2009) we developed a new correction for attenuation of correlation in the presence of measurement error, using Bayesian estimation in a simple hierarchical (randomeffects) model, and showed it improved dramatically on existing alternatives in the psychology literature (to which standard regression errorsinvariables methods reduce when applied to correlation). For the neural spike count data we analyzed, correction for attenuation changed the estimated correlation substantially, making it much more consistent with experimental prediction. Building on the ideas in Kass and Ventura (2006), and building on the hierarhical modeling approach of Behseta et al. (2009), Vinci et al. (2016) provided a thorough treatment of spike count correlation, separating correlation due to trialtotrial variation from that due to withintrial variation.
Spike count correlation applies to large time windows (tens to hundreds of milliseconds). As mentioned above, there has also been intense interest in the physiological significance of nearsynchronous firing within trials, which may be assessed using histogram methods and the crosscorrelogram. These techniques are useful but do not allow for measured sources of variation (which would appear as covariates in the regression framework), including trialtotrial variation. Ventura, Cai, and Kass (2005a) developed a trialaveraged point process regression method using BARS and bootstrap methods, showing that it is substantially more powerful than histogram methods, and Ventura, Cai, and Kass (2005b) adapted this approach to accommodate trialtotrial variation.
An important source of trialtotrial variation may be captured from the overall activity of the many neurons recorded across an electrode array. Using more than 100 simultaneously recorded spike trains from primary visual cortex (V1), Kelly, Smith, Kass, and Lee (2010a) showed how local field potentials (LFP), which are slowlyvarying voltage responses thought to represent synaptic activity in the surrounding network, can be highly correlated with network activity. Accounting for such activity via LFP can improve estimation of stimulusrelated neural response. Kelly, Smith, Kass, and Lee (2010b) extended these ideas using L1 regularized point process models to summarize the network effects arising from dozens of simultaneously recorded neurons.
The point process regression approach of Ventura, Cai, and Kass (2005a,b) was aimed at assessing synchrony (i.e., the extent to which synchronous spiking deviates from that produced by point process noise), and relating it to covariates. This idea was generalized and put on a more solid theoretical foundation by Kass, Kelly, and Loh (2011) and Kelly and Kass (2012). Just as individual spike trains form binary time series, a pair of simultaneouslyrecorded spike trains forms a time series of 2 by 2 tables (with four entries at each time point for the four combinations of spiking or not spiking across both neurons). In general, the spike combinations from n neurons will form a time series of 2^n tables. A statistically natural approach to analyzing these spike combinations, and to look for excess synchronous spiking, is to apply loglinear modeling technology, and many previous authors did so. The standard approach, however, has two shortcomings. First, it assumes stationarity of firing rates across suitable time intervals. Second, it does not incorporate spiking history, or other covariates, and therefore effectively assumes Poisson spiking. In addition, the standard approach ignores the inherent relative sparsity of twoway and multiway synchronous spiking. Kelly and Kass (2012) provide a modification to the usual loglinear modeling methodology to accommodate timevarying firing rates and/or nonPoisson history effects and/or timevarying covariates (representing phenomena such as network activity). This results in a blend of loglinear modeling with pointprocess regression. Scott et al. (2015) showed how large numbers of pairs can be analyzed using Bayesian control of false discoveries (100 neurons gives over 8,000 pairs which would be subject to examination for excess synchrony). Zhou et al. (2016) used the framework of Kass, Kelly, and Loh (2011) and Kelly and Kass (2012) to give a rigorous statistical method for assessing the association between oscillations (in a local field potential) and neural synchrony.
The use of point processes to describe and analyze spike train data, together with the observation that point processes may be considered, approximately, to be binary time series, has been one of the major contributions of statistics to neuroscience. Neuroscience and statistical theory reside in continuous time, while measurements are made and data are analyzed in discrete time. The formal connection between continous time and discrete time is made by passage to the limit as the discrete bin size converges to zero. In trying to make a similar formal connection for multiple binary time series, however, a puzzle emerges: the standard regularity conditions in point processes forbid simultaneous events. How then can we construct point process approximations to multivariate binary time series where simultaneous events are not only allowed but, in fact, a main focus of interest? Kass, Kelly, and Loh (2011) provided a solution to this problem through the introduction of a sequence of point processes (marked processes) that satisfy hierarchical sparsity, according to which 2way spiking is rare relative to 1way spiking, 3way spiking is rare relative to 2way spiking, etc. This contributes a framework for thinking about synchrony, and it produces simple, intuitive estimators and significance tests for effects of excess multiway synchronous spiking. The problem of synchrony detection, and its statistical subtleties, was reviewed by Harrison, Amarasingham, and Kass (2013).
Multielectrode neuronal recording not only helps advance a variety of basic scientific efforts, it also has clinical applications of potentially great importance through neural prostheses using braincomputer interfaces (BCIs). These methods have advanced furthest in motor control. The idea here is that an array of electrodes implanted in a suitable brain region can record enough neurons that by simply thinking about an arm movement, a severelydisabled subject will produce neuronal activity sufficient to drive a robotic device in place of a nonfunctioning arm or hand. Beneficiaries of such devices could include people paralyzed by head or spinal cord trauma, and those with deficits caused by stroke, ALS (``Lou Gehrig's Disease''), cerebral palsy, or multiple sclerosis. In fact, a paper in Nature from the lab of our colleague Andrew Schwartz reported that monkeys can learn to use neuralprosthetic robotic devices to feed themselves, and a followup in Lancet reported that a tetraplegic human could manipulate objects with a BCIdriven robotic arm. For several years prior to the human studies, my statistical colleagues and I collaborated extensively with the Schwartz lab, partly to understand and optimize the neural prosthetic algorithms, and partly to use BCIs for scientific investigation.
Braincomputer interfaces for motor control use decoding algorithms to predict movements from patterns of neural activity. Neurons in primary motor cortex, for example, display directional tuning, meaning that their firing rate increases when arm or hand movement is in a particular direction; once such neural encoding is learned, statistically, it may be decoded to predict movement. We and others have developed and studied efficient decoding algorithms based on statespace models.
Brockwell, Rojas, and Kass (2004) showed how Bayesian sequential decoding, based on statespace models (dynamic models that generalize the Kalman filter) could be far more efficient than the usual method known as the population vector algorithm. However, in that work we used particle filtering, which can be extremely slowpotentially too slow for realtime prosthetic decoding. Koyama, Castellanos, Shalizi, and Kass (2009), developed and studied what we called the LaplaceGauss filter, which is based on a normal approximation to the posterior together with an application of Laplace's method, to provide a fast computational method for nonGaussian and/or nonlinear statespace models. Emery Brown and coworkers had previously used a similar approach in related applications. We streamlined the implementation, proved that the method was asymptotically valid in not accumulating errors across time, and showed that it could be two orders of magnitude faster than particle filtering. We also investigated its use in prosthetics, and Koyama, Chase, Whitford, Velliste, Schwartz, and Kass (2009) showed that the subject can correct for the excess bias in the population vector algorithm when it becomes apparent visually, but not the excess variance. Chase, Schwartz, and Kass (2009) provided additional detail and interpretation of bias correction.
Part of the purpose of these efforts has been to better understand adaptation and learning within motor cortex. In BCI experiments the neurons that drive movement (of a cursor on a computer screen, or of a robotic hand) are under experimental control. Jarosciewicz, Chase, Fraser, Velliste, Kass, and Schwartz (2008) showed that rotational perturbations of the mapping between neurons and cursor movement created a small but statistically discernable change, preferentially, across the population of neurons whose mapping had been perturbed when compared with the remaining population of neurons where the mapping had been left intact. The change in motor cortical neural firing could have involved both changes in inputs, as the subject sought to counter the perturbation, and changes in individual neural tuning curves. Chase, Schwartz, and Kass (2010) introduced latent ``reaiming targets'' to help separate these effects and showed this to improve estimation of neural tuning curves. Reanalysis of the Jarosciewicz et al. data supported the original findings.
Much modeling in the brain sciences pursues the idea that neural systems may behave optimally in the sense of using Bayes' Theorem for inference. An important theoretical contribution (by Ma and colleagues) was to show that neural systems could, in principle, combine information from two sources just as would be required according to Bayesian inference. In Orellana, Rodu, and Kass (2017) we noted that population vectors, which are based on linear models and are clearly computationally feasible by neural systems, could be used in much the same way. In limited circumstances these form a special case of that considered by Ma and colleagues, and they are actually optimal. Even though combining information with population vectors is, in general, suboptimal, we raised the possibility that this simpler neural methodology might be more robust to errors in synaptic weights.