In our previous episodes

Markov processes in time: \(\Prob{X(t+1)|X(0), X(1), \ldots X(t)} = \Prob{X(t+1)|X(t)}\)
- Markov chains: \(X(t)\) is discrete-valued
Asymptotic behavior of Markov chains
Likelihood-based inference for Markov models
What about space? What about space and time?

Markov Random Fields

\(X(r) =\) value of the random field at site \(r\)
\(X\) or \(X(\cdot) =\) value of the field at all sites, the configuration
\(X(-r) =\) value of the random field everywhere other than \(r\)
\(X(\Neighbors(r)) =\) value of the random field at the neighbors of \(r\)
- the neighborhood configuration
\(X\) is a Markov random field when \[ \Prob{X(r)|X(-r)} = \Prob{X(r)|X(\Neighbors(r))} \]
\(p_r(x, y) \equiv \Prob{X(r)=x|X(\Neighbors(r))=y}\) is the local characteristic for site \(r\)
- People also write \(q_{r}(x,y)\)
- Can vary with parameters, \(q_{r}(x,y;\theta)\)

What Does a Markov Random Field Look Like?

Working out joint \(\Prob{X}\) from \(\Prob{X(r)|X(\Neighbors(r))}\) is not easy!
- No natural order to use in factorizing
Approach I: Heroic mathematics
- Truly heroic in some cases…
Approach II: Gibbs-Markov equivalence
- Upshot: For the right functions \(V_r\), \(V_{rq}\), \[ \log{\Prob{X=x}} = \text{(constant)} + \sum_{r}{V_r(x(r))} + \sum_{r}{\sum_{q\in\Neighbors(r)}{V_{rq}(x(r), x(q))}} \]
- Getting the constant is hard
- See backup
Approach III: Gibbs sampler
- Same Gibbs, different idea

The Gibbs Sampler (reprise)

Assumes space \(r\) is discrete, but state \(X(r)\) can be continuous
Start with some initial value of \(X(r)\) for all \(r\)
Fix an order on the sites \(r\), and then:
- \(X(r) \sim \Prob{X(r)|X(\Neighbors(r)) = X(r)}\)
- \(=\) draw a new value for \(X(r)\) from the conditional distribution / local characteristic and replace the old value of \(X(r)\)
- N.B., later sites get conditioned on the updated value for \(X(r)\)
Sweep through all sites at least once, and generally a multiple times
Each sweep is one step of a big Markov chain \(\Rightarrow\) converges on an invariant joint distribution
See Kaplan et al. (2018) for a clever idea for speeding this up by simultaneous updating

Inference: Basics

If we have multiple, independent copies of the field
- Can find \(p_r(x,y)\) for each \(r\)
- Use conditional density estimation or even regression
If \(p_r(x,y)\) is the same for all \(r\)
- Could do conditional density estimation or even regression
- Usual error statistics aren’t valid because of dependence
Can also do simulation-based inference
- Conditional auto-regressions make good auxiliaries for indirect inference

Inference: Likelihood

Full likelihood inference is hard
- Need the joint distribution \(\Prob{X=x}\)
- Often need to do Monte Carlo just to get the likelihood
Pseudolikelihood: Product of all the local characteristics \(p_r\) \[ \prod_{r}{\Prob{X(r)=x(r)|X(\Neighbors(r))=x(\Neighbors(r))}} \]
- \(\neq\) the joint probability
- Also called a composite likelihood
- Generally, consistent, but not as statistically efficient as likelihood

Inference: Uncertainty

If you can get the likelihood, use the Hessian of the log-likelihood
Simulation-based methods work as usual
Parametric bootstrap works as usual
Nonparametric bootstrap for stationary fields
- Use rectangular blocks

Adding Time Back In

Now have \(X(r,t)\)
- Write \(X(\cdot, t)\) for the whole configuration at time \(t\)
Two choices for Markov property
Option I: \(X(r,t)\) should be conditioned on \(X(\Neighbors(r), t-1)\) and \(X(r,t-1)\)
- So \(X(r,t)\) and \(X(q,t)\) are independent given \(X(\cdot, t-1)\)
Option II: Condition on \(X(\Neighbors(r), t-1)\), \(X(r,t-1)\) and \(X(\Neighbors(r), t)\)
- Neighbors at time \(t\) are still dependent given \(X(\cdot, t-1)\)
Sometimes called recursive vs. simultaneous

Spatio-temporal Markov Random Fields

Option I (recursive) fields are much easier
- \(\Prob{X(\cdot, t)|X(\cdot, t-1)} = \prod_{r}{\Prob{X(r,t)|X(\Neighbors(r), t-1), X(r,t-1)}}\)
- Can actually calculate the likelihood
- Can estimate \(\Prob{X(r,t)|X(\Neighbors(r), t-1), X(r,t-1)}\) the same way as in a Markov chain
Option II (simultaneous) fields are not as nice
- Basically, back to spatial models with an extra coordinate
- Composite likelihood often the best bet

Cellular Automata

\(X(r,t)\) discrete (usually finite), \(r\) on a regular lattice, \(t\) discrete
Recursive (option I) spatio-temporal Markov model
- Sometimes conditionally deterministic
These are very expressive models
- Some examples: I, II

Summary

In a (spatial) Markov random field, \(X(r)\) is “screened off” from the rest of the field by its neighbors
- Conditional distribution of \(X(r)\) given neighbors = local characteristic of site \(r\)
Local characteristics determine the joint distribution, but actually solving for the joint distribution is hard
- Gibbs-Markov equivalence gives a general result, but not always an easy thing to calculate with
Estimation via
- Local conditional model (under homogeneity or multiple realizations)
- Likelihood (if you can calculate it)
- Likelihood-ish objective functions
- Simulation-based inference

Backup: Gibbs-Markov Theorem

\(X=\) the whole random field, the vector of \(X(r)\) for all \(r\)
\(X\) has a Gibbs distribution when \[ \Prob{X=x} = \frac{1}{Z}\exp{-\left(\sum_{A}{V_A(x)}\right)} \]
- \(A =\) subsets of sites
- \(V_A=\) potential function, depending only on the state of sites in A
- \(Z \equiv \sum_{x}{exp{-\left(\sum_{A}{V_A(x)}\right)}} =\) partition function
A set of sites is a clique if every site in \(A\) is a neighbor of every other site
A potential is a nearest-neighbor potential if \(V_A = 0\) whenever \(A\) isn’t a clique

Backup: Gibbs-Markov Theorem

On a finite set of sites, if \(\Prob{X=x}> 0\) for all configurations \(x\), then \(X\) is a Markov random field if and only if \(X\) has a Gibbs distribution with a nearest-neighbor potential
Gibbs \(\Rightarrow\) Markov: direct calculation
Markov \(\Rightarrow\) Gibbs: complicated!
- Our readings from Guttorp sketch a proof due to Griffeath (1976)
\(Z\) is hard to calculate either way
Sampling from Gibbs distributions:
- Use Gibbs sampler (since it’s Markov)
- Use MCMC, since that doesn’t need \(Z\)

Backup: Gibbs-Markov Theorem

Statisticians usually call this the “Hammersley-Clifford” theorem
It was proved simultaneously and independent by H&C, Besag, Griffeath, Grimmett, …
Priority is a mess, and Griffeath was one of my thesis advisers…
“Gibbs-Markov” expresses the content better, and is equally slighting to everyone still alive

Backup: A Conditional Likelihood for a Fraction of the Data

On a square lattice, color the sites alternately black and white
Each white site has only black neighbors
White sites are independent given the black sites (by Markov property)
Multiply the local characteristics of the white sites to get a conditional likelihood \[ \Prob{X(\text{white})=x(\text{white})|X(\text{black})=x(\text{black})} = \prod_{r \in \text{white}}{\Prob{X(r)=x(r)|X(\Neighbors(r)) = x(\Neighbors(r))}} \]
Bartlett (1975), pp. 27–28, calls this “coding”
- \(=\) the “concliques” of Kaplan et al. (2018)
- Adapts to other geometries (see Bartlett again)
- Efficiency compared to pseudolikelihood is unclear

References

Bartlett, M. S. 1975. The Statistical Analysis of Spatial Pattern. London: Chapman; Hall.

Griffeath, David. 1976. “Introduction to Markov Random Fields.” In Denumerable Markov Chains, edited by John G. Kemeny, J. Laurie Snell, and Anthony W. Knapp, Second, 425–57. Berlin: Springer-Verlag.

Kaplan, Andee, Mark S. Kaiser, Soumendra N. Lahiri, and Daniel J. Nordman. 2018. “Simulating Markov Random Fields with a Conclique-Based Gibbs Sampler.” arxiv.org:1808.04739. https://arxiv.org/abs/1808.04739.