I confess I chose this picture because it makes me seem older and more important than I really am; but it gives a fair impression of my desk (and hair), and people recognize me from it at meetings. |
Associate Professor
Statistics Department Baker Hall 229C Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213-3890 USA
Center for the Neural Basis of Cognition
Related Faculty
Heinz College of Public Policy
and
External Professor
Personal website: bactra.org |
Observe the doubly-stochastic transition matrix on the wall, the implements for machine reconstruction, the collaborators with the patience of angels... |

E-mail: cshalizi [at] cmu [dot] edu | ||

I can't help you get into CMU, or hire you.
Also, I won't do peer review for Elsevier. |

*Interests*: Nonparametric prediction of time
series; learning
theory and nonlinear
dynamics; information
theory; stochastic
automata, state space and
hidden Markov
models; causation
and
prediction; large
deviations
and ergodic
theory; neuroscience; statistical
mechanics
and self-organization; social
and complex
networks; heavy-tailed
distributions.

(For my complete papers, dissertation, CV, selected presentations, etc., please see my main research page. Potential dissertation students should look at my list of on-going and possible projects. If you are not at CMU, by all means apply to our graduate program. I have no influence over admissions, and don't want any, so writing me about that is a waste of your time. I have no openings for post-docs or other employees. Nor will I make an exception for you.)

My work revolves around prediction and inference for dependent, and often high-dimensional, data, drawing on tools from machine learning, nonlinear dynamics and information theory.

My original training is in the statistical physics of complex systems — high-dimensional systems where the variables are strongly interdependent, but cannot be effectively resolved into a single low-dimensional subspace. I particularly worked with symbolic dynamics, and with cellular automata, spatial stochastic processes modeling pattern formation, fluid flow, magnetism and distributed computation, among other things. I remain interested in the role of information theory and statistical inference in the foundations of statistical mechanics, where I think some of the conventional views have things completely backwards

Much of my earlier work involves complexity measures, like thermodynamic depth, and especially Grassberger-Crutchfield-Young "statistical complexity", the amount of information about the past of a system needed to optimally predict its future. This is related to the idea of a minimal predictively-sufficient statistic, and in turn to the existence and uniqueness of a predictively optimal Markovian representation for every stochastic process, whether the original process is Markovian or not. (Details.) The same ideas also work on spatially extended systems, including those where space is an irregular graph or network, only then the predictive representation is a Markov random field.

As a post-doc, I moved from the mathematics of optimal prediction to devising algorithms to estimate such predictors from finite data, and applying those algorithms to concrete problems. On the algorithmic side, Kristina Klinkner and I devised an algorithm, CSSR, which exploits the formal properties of the optimal predictive states to efficiently reconstruct them from discrete sequence data. (This is related to, but strictly more powerful than, variable-length Markov chains or context trees.) Working with Rob Haslinger, we also developed a reconstruction algorithm for spatio-temporal random fields. We've used that to give a quantitative test for self-organization, and to automatically filter stochastic fields to identify their coherent structures (with Jean-Baptiste Rouquier and Cristopher Moore). My student Georg Goerg wrote his thesis in this area, extending the technique to continuous-valued fields and a nonparametric EM algorithm.

My more recent work falls into the areas of heavy tails, learning theory for time series, Bayesian consistency, neuroscience, network analysis and causal inference, with some overlap between these.

*Heavy tailed distributions* are produced by many complex systems,
and have attracted a lot of interest over recent decades.
My most-cited paper,
with Aaron Clauset
and Mark Newman, concerns
proper statistical inference for power law (Pareto, Zipf) distributions. I
have also worked on estimation
and testing for a modified class of Pareto distributions,
called *q*-exponential or Tsallis distributions, sometimes used in
statistical mechanics.

*Learning theory*: I collaborate
with Daniel McDonald
and Mark Schervish on extending
statistical learning theory to time series prediction, aiming
at reforming the evaluation of
macroeconomic forecasting. Steps along this way include the non-parametric
estimation of dependence coefficients
[i, ii], and
risk bounds for state-space models.
(I'm also interested in risk bounds without strong
mixing.) Separately, Aryeh
Kontorovich and I have worked on establishing the right notion of
predictive
probably-approximately-correct learning. I am increasingly interested in
forecasting non-stationary processes, where I think the right goal is
to achieve low regret through a
growing ensemble of models (with Abigail Jacobs, Klinkner and Clauset).

*Bayes*: Bayesian inference is a smoothing or regularization device,
trading variance for bias, rather than a fundamental principle. This viewpoint
led to work with Andrew
Gelman on how the practice of
Bayesian data analysis relates to the philosophy of science. More
technically, I am interested in the frequentist properties of Bayesian methods,
especially the
convergence of non-parametric Bayesian
updating with mis-specified models and dependent data. Those results come
from an identity between Bayesian updating and the "replicator dynamic" of
evolutionary biology, of independent interest.

*Neuroscience*: One major application of CSSR has been to analyze the
computational structure of spike
trains (with Haslinger and Klinkner). An ongoing set of projects, with
Klinkner
and Marcelo
Camperi, uses the reconstructed states to build a noise-tolerant measure of
coordinated activity and
information sharing called "informational coherence". Informational
coherence, in turn, defines
functional modules of neurons with coordinated behavior, cutting across the
usual anatomical modules. In addition, I'm involved in more conventional
statistical modeling of neural signals, such as using multi-channel EEG data to
identify sleep anomalies (with
Matthew
Berryman), and analytic
approximations to traditional nonlinear state-estimation
(with Shinsuke
Koyama, Lucia Castellanos
and Rob Kass), applied to neural
decoding.

*Networks and causal inference*: My work on functional connectivity
and modularity is about extracting networks from coordinated behavior.
In social systems, I am more interested in the reverse problem, of how
network structure shapes collective
behavior. This has led me to explore,
with Alessandro Rinaldo, the
limits of exponential-family random
graphs (with implications for dependent exponential families generally);
and, in a very different direction, to work
with Henry Farrell on the role of
networks in institutional change. More
notoriously, Andrew Thomas and I have shown
that
causal inferences on networks are
generically confounded, though I hold out hope for loop-holes. Currently,
much of my time goes into non-parametric network modeling, aiming at
statistical comparison of
networks, with my students Dena
Asta and Lawrence Wang.

I am (slowly) writing a book on the statistical analysis of complex systems models.

Student: Whenever there is any question, one's mind is confused. What is the matter?Master Ts'ao-shan: Kill, kill!

This fall, I will be teaching 36-401, modern regression for undergraduates, and in spring 2016 the follow-up course 36-402, advanced data analysis; it will be my first time with 401, and my fifth time with 402.

In the past, I've taught 36-350, introduction to statistical computing (including co-teaching with Andrew Thomas and, originally, Vince Vu); 36-757 (the data analysis project class for Ph.D. students); co-taught 36-835 ("statistical modeling journal club") with Rob Kass; the old 36-350, data mining; 36-490, undergraduate research, on my own and with Brian Junker; 36-220, engineering statistics and quality control; 36-462, "Chaos, complexity, and inference"; 36-754, advanced stochastic processes; and 46-929, financial time series analysis, with Anthony Brockwell.

Following those links, you'll get a draft textbook for undergraduate ADA, lecture notes for data mining, and slides (with a few notes) for 462. The notes for stochastic processes turned into a 270-page book manuscript, under the working title of Almost None of the Theory of Stochastic Processes. (I am not so happy, in retrospect, with how I taught 220.) My old teaching page has my other lecture notes, and teaching evaluations from graduate school.

If you're a current CMU student and you want to talk to me, please send e-mail to make an appointment (and check my calendar first for when I'm free).

Why are students today not successful? What is the trouble? The trouble lies in their lack of self-confidence. If you do not have enough self-confidence, you will busily submit yourself to all kinds of external conditions and transformations, and be enslaved and turned around by them and lose your freedom. But if you can stop the mind that seeks [those external conditions] in every instant of thought, you will then be no different from the old masters. — Master I-hsüan

The scholar Zhong Kui, supported by his faithful assistants, sets out to quell the demons of ignorance and banish the ghosts of superstition. |
Office hours for summer 2015: by appointment |