---
title: Markov Random Fields
author: 36-467/36-667
output: slidy_presentation
bibliography: locusts.bib
---
\[
\newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)}
\newcommand{\Neighbors}{\mathcal{N}}
\]
## In our previous episodes
- Markov processes in time: $\Prob{X(t+1)|X(0), X(1), \ldots X(t)} = \Prob{X(t+1)|X(t)}$
+ Markov chains: $X(t)$ is discrete-valued
- Asymptotic behavior of Markov chains
- Likelihood-based inference for Markov models
- What about space? What about space and time?
## Markov Random Fields
- $X(r) =$ value of the random field at site $r$
- $X$ or $X(\cdot) =$ value of the field at all sites, the **configuration**
- $X(-r) =$ value of the random field everywhere other than $r$
- $X(\Neighbors(r)) =$ value of the random field at the neighbors of $r$
+ the **neighborhood configuration**
- $X$ is a **Markov random field** when
\[
\Prob{X(r)|X(-r)} = \Prob{X(r)|X(\Neighbors(r))}
\]
- $p_r(x, y) \equiv \Prob{X(r)=x|X(\Neighbors(r))=y}$ is the **local characteristic** for site $r$
+ People also write $q_{r}(x,y)$
+ Can vary with parameters, $q_{r}(x,y;\theta)$
## What Does a Markov Random Field Look Like?
- Working out joint $\Prob{X}$ from $\Prob{X(r)|X(\Neighbors(r))}$ is not easy!
+ No natural order to use in factorizing
- Approach I: Heroic mathematics
+ Truly heroic in some cases...
- Approach II: Gibbs-Markov equivalence
+ Upshot: For the right functions $V_r$, $V_{rq}$,
\[
\log{\Prob{X=x}} = \text{(constant)} + \sum_{r}{V_r(x(r))} + \sum_{r}{\sum_{q\in\Neighbors(r)}{V_{rq}(x(r), x(q))}}
\]
+ Getting the constant is hard
+ See backup
- Approach III: Gibbs sampler
+ Same Gibbs, different idea
## The Gibbs Sampler (reprise)
- Assumes space $r$ is discrete, but state $X(r)$ can be continuous
- Start with _some_ initial value of $X(r)$ for all $r$
- Fix an order on the sites $r$, and then:
+ $X(r) \sim \Prob{X(r)|X(\Neighbors(r)) = X(r)}$
+ $=$ draw a new value for $X(r)$ from the conditional distribution / local characteristic and replace the old value of $X(r)$
+ N.B., later sites get conditioned on the updated value for $X(r)$
- Sweep through all sites at least once, and generally a multiple times
- Each sweep is one step of a big Markov chain $\Rightarrow$ converges on an invariant _joint_ distribution
- See @Kaplan-conclique-Gibbs-sampler for a clever idea for speeding this up by simultaneous updating
## Inference: Basics
- If we have multiple, independent copies of the field
+ Can find $p_r(x,y)$ for each $r$
+ Use conditional density estimation or even regression
- If $p_r(x,y)$ is the same for all $r$
+ Could do conditional density estimation or even regression
+ Usual error statistics aren't valid because of dependence
- Can also do simulation-based inference
+ [Conditional auto-regressions](http://www.stat.cmu.edu/~cshalizi/dst/18/lectures/13/lecture-13.html#(11)) make good auxiliaries for indirect inference
## Inference: Likelihood
- Full likelihood inference is hard
+ Need the joint distribution $\Prob{X=x}$
+ Often need to do Monte Carlo just to get the likelihood
- **Pseudolikelihood**: Product of all the local characteristics $p_r$
\[
\prod_{r}{\Prob{X(r)=x(r)|X(\Neighbors(r))=x(\Neighbors(r))}}
\]
+ $\neq$ the joint probability
+ Also called a **composite likelihood**
+ Generally, consistent, but not as statistically efficient as likelihood
## Inference: Uncertainty
- If you can get the likelihood, use the Hessian of the log-likelihood
- Simulation-based methods work as usual
- Parametric bootstrap works as usual
- Nonparametric bootstrap for stationary fields
+ Use rectangular blocks
## Adding Time Back In
- Now have $X(r,t)$
+ Write $X(\cdot, t)$ for the whole configuration at time $t$
- Two choices for Markov property
- Option I: $X(r,t)$ should be conditioned on $X(\Neighbors(r), t-1)$ and $X(r,t-1)$
+ So $X(r,t)$ and $X(q,t)$ are independent given $X(\cdot, t-1)$
- Option II: Condition on $X(\Neighbors(r), t-1)$, $X(r,t-1)$ _and_ $X(\Neighbors(r), t)$
+ Neighbors at time $t$ are still dependent given $X(\cdot, t-1)$
- Sometimes called **recursive** vs. **simultaneous**
## Spatio-temporal Markov Random Fields
- Option I (recursive) fields are _much easier_
+ $\Prob{X(\cdot, t)|X(\cdot, t-1)} = \prod_{r}{\Prob{X(r,t)|X(\Neighbors(r), t-1), X(r,t-1)}}$
+ Can actually calculate the likelihood
+ Can estimate $\Prob{X(r,t)|X(\Neighbors(r), t-1), X(r,t-1)}$ the same way as in a Markov chain
- Option II (simultaneous) fields are not as nice
+ Basically, back to spatial models with an extra coordinate
+ Composite likelihood often the best bet
## Cellular Automata
- $X(r,t)$ discrete (usually finite), $r$ on a regular lattice, $t$ discrete
- Recursive (option I) spatio-temporal Markov model
+ Sometimes conditionally deterministic
- These are _very_ expressive models
+ Some examples: [I](http://www.stat.cmu.edu/~cshalizi/462/lectures/10/10.pdf), [II](http://www.stat.cmu.edu/~cshalizi/462/lectures/11/11.pdf)
## Summary
- In a (spatial) Markov random field, $X(r)$ is "screened off" from the rest of the field by its neighbors
+ Conditional distribution of $X(r)$ given neighbors = local characteristic of site $r$
- Local characteristics determine the joint distribution, but actually _solving_ for the joint distribution is hard
+ Gibbs-Markov equivalence gives a general result, but not always an easy thing to calculate with
- Estimation via
+ Local conditional model (under homogeneity or multiple realizations)
+ Likelihood (if you can calculate it)
+ Likelihood-ish objective functions
+ Simulation-based inference
## Backup: Gibbs-Markov Theorem
- $X=$ the whole random field, the vector of $X(r)$ for all $r$
- $X$ has a **Gibbs distribution** when
\[
\Prob{X=x} = \frac{1}{Z}\exp{-\left(\sum_{A}{V_A(x)}\right)}
\]
+ $A =$ subsets of sites
+ $V_A=$ **potential function**, depending only on the state of sites in A
+ $Z \equiv \sum_{x}{exp{-\left(\sum_{A}{V_A(x)}\right)}} =$ **partition function**
- A set of sites is a **clique** if every site in $A$ is a neighbor of every other site
- A potential is a **nearest-neighbor potential** if $V_A = 0$ whenever $A$ isn't a clique
## Backup: Gibbs-Markov Theorem
- On a finite set of sites, if $\Prob{X=x}> 0$ for all configurations $x$,
then $X$ is a Markov random field if and only if $X$ has a Gibbs
distribution with a nearest-neighbor potential
- Gibbs $\Rightarrow$ Markov: direct calculation
- Markov $\Rightarrow$ Gibbs: complicated!
+ Our readings from Guttorp sketch a proof due to @Griffeath-on-random-fields
- $Z$ is hard to calculate either way
- Sampling from Gibbs distributions:
+ Use Gibbs sampler (since it's Markov)
+ Use MCMC, since that doesn't need $Z$
## Backup: Gibbs-Markov Theorem
- Statisticians usually call this the "Hammersley-Clifford" theorem
- It was proved simultaneously and independent by H&C, Besag, Griffeath, Grimmett, ...
- Priority is a mess, and Griffeath was one of my thesis advisers...
- "Gibbs-Markov" expresses the content better, and is equally slighting to everyone still alive
## Backup: A Conditional Likelihood for a Fraction of the Data
- On a square lattice, color the sites alternately black and white
- Each white site has only black neighbors
- White sites are independent given the black sites (by Markov property)
- Multiply the local characteristics of the white sites to get a conditional likelihood
\[
\Prob{X(\text{white})=x(\text{white})|X(\text{black})=x(\text{black})} = \prod_{r \in \text{white}}{\Prob{X(r)=x(r)|X(\Neighbors(r)) = x(\Neighbors(r))}}
\]
- @Bartlett-spatial-pattern, pp. 27--28, calls this "coding"
+ $=$ the "concliques" of @Kaplan-conclique-Gibbs-sampler
+ Adapts to other geometries (see Bartlett again)
+ Efficiency compared to pseudolikelihood is unclear
## References