I confess I chose this picture because it makes me seem older and more important than I really am; but it gives a fair impression of my desk (and hair), and people recognize me from it at meetings. |
Associate Professor
Statistics Department Baker Hall 229C Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213-3890 USA
Associate Professor, Machine Learning Department
Center for the Neural Basis of Cognition
Heinz College of Public Policy
and
External Professor
Personal website: bactra.org |
Observe the doubly-stochastic transition matrix on the wall, the implements for machine reconstruction, the collaborators with the patience of angels... |

E-mail: cshalizi [at] cmu [dot] edu | ||

I can't help you get into CMU, or hire you.
Also, I won't do peer review for Elsevier. |

*Interests*: Nonparametric prediction of time
series; learning
theory and nonlinear
dynamics; information
theory; stochastic
automata, state space and
hidden Markov
models; causation
and
prediction; large
deviations
and ergodic
theory; neuroscience; statistical
mechanics
and self-organization; social
and complex
networks; heavy-tailed
distributions; collective cognition and distributed problem-solving.

(For my complete papers, dissertation, CV, selected presentations, etc., please see my main research page. Potential dissertation students should look at my list of on-going and possible projects, but I will not be taking any new students until fall 2020 at the earliest. If you are not at CMU, by all means apply to our graduate program. I have no influence over admissions, and don't want any, so writing me about that is a waste of your time. I have no openings for post-docs or other employees. This includes self- or government- funded visitors.)

My work revolves around prediction and inference for dependent, and often high-dimensional, data, drawing on tools from machine learning, nonlinear dynamics and information theory.

My original training is in the statistical physics of complex systems — high-dimensional systems where the variables are strongly interdependent, but cannot be effectively resolved into a single low-dimensional subspace. I particularly worked with symbolic dynamics, and with cellular automata, which are spatial stochastic processes modeling pattern formation, fluid flow, magnetism and distributed computation, among other things. I remain interested in the role of information theory and statistical inference in the foundations of statistical mechanics, where I think some of the conventional views have things completely backwards

Much of my earlier work involves complexity measures, like thermodynamic depth, and especially Grassberger-Crutchfield-Young "statistical complexity", the amount of information about the past of a system needed to optimally predict its future. This is related to the idea of a minimal predictively-sufficient statistic, and in turn to the existence and uniqueness of a predictively optimal Markovian representation for every stochastic process, whether the original process is Markovian or not. (Details.) The same ideas also work on spatially extended systems, including those where space is an irregular graph or network, only then the predictive representation is a Markov random field.

As a post-doc, I moved from the mathematics of optimal prediction to devising algorithms to estimate such predictors from finite data, and applying those algorithms to concrete problems. On the algorithmic side, Kristina Klinkner and I devised an algorithm, CSSR, which exploits the formal properties of the optimal predictive states to efficiently reconstruct them from discrete sequence data. (This is related to, but strictly more powerful than, variable-length Markov chains or context trees.) Working with Rob Haslinger, we also developed a reconstruction algorithm for spatio-temporal random fields. We've used that to give a quantitative test for self-organization, and to automatically filter stochastic fields to identify their coherent structures (with Jean-Baptiste Rouquier and Cristopher Moore). My student Georg Goerg wrote his thesis in this area, extending the technique to continuous-valued fields and a nonparametric EM algorithm. My student George Montañez worked on fast, approximate algorithms for such prediction, before writing his thesis on an information-theoretic explanation for why machine learning works.

My more recent work falls into the areas of heavy tails, learning theory for time series, Bayesian consistency, neuroscience, network analysis and causal inference, with some overlap between these.

*Heavy tailed distributions* are produced by many complex systems,
and have attracted a lot of interest over recent decades.
My most-cited paper,
with Aaron Clauset
and Mark Newman, concerns
proper statistical inference for power law (Pareto, Zipf) distributions. I
have also worked on estimation
and testing for a modified class of Pareto distributions,
called *q*-exponential or Tsallis distributions, sometimes used in
statistical mechanics.

*Learning theory*: I collaborated
with Daniel McDonald
and Mark Schervish on extending
statistical learning theory to time series prediction, aiming
at reforming the evaluation of
macroeconomic forecasting. Steps along this way include the non-parametric
estimation of dependence coefficients
[i, ii], and
risk bounds for state-space models.
(I'm also interested in risk bounds without strong
mixing.) Separately, Aryeh
Kontorovich and I have worked on establishing the right notion of
predictive
probably-approximately-correct learning.
This is amenable to bootstrap bounds
on the generalization error (with Robert Lunde).

I am increasingly interested in forecasting non-stationary processes, where I think the right goal is to achieve low regret through a growing ensemble of models (with Abigail Jacobs, Klinkner and Clauset).

*Bayes*: Bayesian inference is a smoothing or regularization device,
trading variance for bias, rather than a fundamental principle. This viewpoint
led to work with Andrew
Gelman on how the practice of
Bayesian data analysis relates to the philosophy of science. More
technically, I am interested in the frequentist properties of Bayesian methods,
especially the
convergence of non-parametric Bayesian
updating with mis-specified models and dependent data. Those results come
from an identity between Bayesian updating and the "replicator dynamic" of
evolutionary biology, of independent interest.

*Neuroscience*: One major application of CSSR has been to analyze the
computational structure of spike
trains (with Haslinger and Klinkner). One set of projects, with
Klinkner
and Marcelo
Camperi, uses the reconstructed states to build a noise-tolerant measure of
coordinated activity and
information sharing called "informational coherence". Informational
coherence, in turn, defines
functional modules of neurons with coordinated behavior, cutting across the
usual anatomical modules. In addition, I'm involved in more conventional
statistical modeling of neural signals, such as using multi-channel EEG data to
identify sleep anomalies (with
Matthew
Berryman), and analytic
approximations to traditional nonlinear state-estimation
(with Shinsuke
Koyama, Lucia Castellanos
and Rob Kass), applied to neural
decoding.

*Networks and causal inference*: My work on functional connectivity
and modularity is about extracting networks from coordinated behavior. (This,
in a way, was also what the Six
Degrees of Francis Bacon project was about.) In social systems, I am more
interested in the reverse problem, of how
network structure shapes collective
behavior. This has led me to explore,
with Alessandro Rinaldo, the
limits of exponential-family random
graphs (with implications for dependent exponential families generally);
and, in a very different direction, to work
with Henry Farrell on the role of
networks in institutional change. More
notoriously, Andrew Thomas and I have shown
that
causal inferences on networks are
generically confounded, though recently Edward McFowland and I have
found some loop-holes for
networks
where community
discovery works well.

Currently, much of my time goes into non-parametric network modeling, looking at issues like how to map nodes from a graph into a continuous latent space (with Dena Asta), how to bootstrap random graphs (with Alden Green), and how to model large, sparse networks (with Neil Spencer). Ultimately, this line of work aims at statistical comparison of networks (with my former students Dena Asta and Lawrence Wang).

I am (slowly) writing a book on the statistical analysis of complex systems models.

Student: Whenever there is any question, one's mind is confused. What is the matter?Master Ts'ao-shan: Kill, kill!

In fall 2018, I will teach 36-467 / 36-667, data over space and time, an introduction to time series, spatial, and spatio-temporal statistics.

In 2017--2018, I was on much-needed sabbatical leave.

In spring 2017, I taught 36-402, undergraduate advanced data analysis for the sixth time. In fall 2016, I taught two new half-semester courses on networks, one an introduction to statistical network models (36-720), and the other an advanced course on non-parametric network modeling (36-781).

In the past, I've taught 36-350, introduction to statistical computing (including co-teaching with Andrew Thomas and, originally, Vince Vu); 36-401, modern regression for undergraduates; 36-757 (the data analysis project class for Ph.D. students); co-taught 36-835 ("statistical modeling journal club") with Rob Kass; the old 36-350, data mining; 36-490, undergraduate research, on my own and with Brian Junker; 36-220, engineering statistics and quality control; 36-462, "Chaos, complexity, and inference"; 36-754, advanced stochastic processes; and 46-929, financial time series analysis, with Anthony Brockwell.

Following those links, you'll get a draft textbook for undergraduate ADA, lecture notes for networks, linear regression and data mining, and slides (with a few notes) for statistical computing and for complexity and inference. The notes for stochastic processes turned into a 270-page book manuscript, under the working title of Almost None of the Theory of Stochastic Processes. (I am not so happy, in retrospect, with how I taught 220.) My old teaching page has my other lecture notes, and teaching evaluations from graduate school.

Why are students today not successful? What is the trouble? The trouble lies in their lack of self-confidence. If you do not have enough self-confidence, you will busily submit yourself to all kinds of external conditions and transformations, and be enslaved and turned around by them and lose your freedom. But if you can stop the mind that seeks [those external conditions] in every instant of thought, you will then be no different from the old masters. — Master I-hsüan

The scholar Zhong Kui, supported by his faithful assistants, sets out to quell the demons of ignorance and banish the ghosts of superstition. |
Office hours in fall 2018 are 1:00--3:00 for students in 36-467, or by appointment; please look at my calendar first |