# Markov Random Fields

$\newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \newcommand{\Neighbors}{\mathcal{N}}$

# In our previous episodes

• Markov processes in time: $$\Prob{X(t+1)|X(0), X(1), \ldots X(t)} = \Prob{X(t+1)|X(t)}$$
• Markov chains: $$X(t)$$ is discrete-valued
• Asymptotic behavior of Markov chains
• Likelihood-based inference for Markov models

# Markov Random Fields

• $$X(r) =$$ value of the random field at site $$r$$
• $$X$$ or $$X(\cdot) =$$ value of the field at all sites, the configuration
• $$X(-r) =$$ value of the random field everywhere other than $$r$$
• $$X(\Neighbors(r)) =$$ value of the random field at the neighbors of $$r$$
• the neighborhood configuration
• $$X$$ is a Markov random field when $\Prob{X(r)|X(-r)} = \Prob{X(r)|X(\Neighbors(r))}$
• $$p_r(x, y) \equiv \Prob{X(r)=x|X(\Neighbors(r))=y}$$ is the local characteristic for site $$r$$
• People also write $$q_{r}(x,y)$$
• Can vary with parameters, $$q_{r}(x,y;\theta)$$

# What Does a Markov Random Field Look Like?

• Working out joint $$\Prob{X}$$ from $$\Prob{X(r)|X(\Neighbors(r))}$$ is not easy!
• No natural order to use in factorizing
• Approach I: Heroic mathematics
• Truly heroic in some cases…
• Approach II: Gibbs-Markov equivalence
• Upshot: For the right functions $$V_r$$, $$V_{rq}$$, $\log{\Prob{X=x}} = \text{(constant)} + \sum_{r}{V_r(x(r))} + \sum_{r}{\sum_{q\in\Neighbors(r)}{V_{rq}(x(r), x(q))}}$
• Getting the constant is hard
• See backup
• Approach III: Gibbs sampler
• Same Gibbs, different idea

# The Gibbs Sampler (reprise)

• Assumes space $$r$$ is discrete, but state $$X(r)$$ can be continuous
• Start with some initial value of $$X(r)$$ for all $$r$$
• Fix an order on the sites $$r$$, and then:
• $$X(r) \sim \Prob{X(r)|X(\Neighbors(r)) = X(r)}$$
• $$=$$ draw a new value for $$X(r)$$ from the conditional distribution / local characteristic and replace the old value of $$X(r)$$
• N.B., later sites get conditioned on the updated value for $$X(r)$$
• Sweep through all sites at least once, and generally a multiple times
• Each sweep is one step of a big Markov chain $$\Rightarrow$$ converges on an invariant joint distribution
• See Kaplan et al. (2018) for a clever idea for speeding this up by simultaneous updating

# Inference: Basics

• If we have multiple, independent copies of the field
• Can find $$p_r(x,y)$$ for each $$r$$
• Use conditional density estimation or even regression
• If $$p_r(x,y)$$ is the same for all $$r$$
• Could do conditional density estimation or even regression
• Usual error statistics aren’t valid because of dependence
• Can also do simulation-based inference

# Inference: Likelihood

• Full likelihood inference is hard
• Need the joint distribution $$\Prob{X=x}$$
• Often need to do Monte Carlo just to get the likelihood
• Pseudolikelihood: Product of all the local characteristics $$p_r$$ $\prod_{r}{\Prob{X(r)=x(r)|X(\Neighbors(r))=x(\Neighbors(r))}}$
• $$\neq$$ the joint probability
• Also called a composite likelihood
• Generally, consistent, but not as statistically efficient as likelihood

# Inference: Uncertainty

• If you can get the likelihood, use the Hessian of the log-likelihood
• Simulation-based methods work as usual
• Parametric bootstrap works as usual
• Nonparametric bootstrap for stationary fields
• Use rectangular blocks

• Now have $$X(r,t)$$
• Write $$X(\cdot, t)$$ for the whole configuration at time $$t$$
• Two choices for Markov property
• Option I: $$X(r,t)$$ should be conditioned on $$X(\Neighbors(r), t-1)$$ and $$X(r,t-1)$$
• So $$X(r,t)$$ and $$X(q,t)$$ are independent given $$X(\cdot, t-1)$$
• Option II: Condition on $$X(\Neighbors(r), t-1)$$, $$X(r,t-1)$$ and $$X(\Neighbors(r), t)$$
• Neighbors at time $$t$$ are still dependent given $$X(\cdot, t-1)$$
• Sometimes called recursive vs. simultaneous

# Spatio-temporal Markov Random Fields

• Option I (recursive) fields are much easier
• $$\Prob{X(\cdot, t)|X(\cdot, t-1)} = \prod_{r}{\Prob{X(r,t)|X(\Neighbors(r), t-1), X(r,t-1)}}$$
• Can actually calculate the likelihood
• Can estimate $$\Prob{X(r,t)|X(\Neighbors(r), t-1), X(r,t-1)}$$ the same way as in a Markov chain
• Option II (simultaneous) fields are not as nice
• Basically, back to spatial models with an extra coordinate
• Composite likelihood often the best bet

# Cellular Automata

• $$X(r,t)$$ discrete (usually finite), $$r$$ on a regular lattice, $$t$$ discrete
• Recursive (option I) spatio-temporal Markov model
• Sometimes conditionally deterministic
• These are very expressive models
• Some examples: I, II

# Summary

• In a (spatial) Markov random field, $$X(r)$$ is “screened off” from the rest of the field by its neighbors
• Conditional distribution of $$X(r)$$ given neighbors = local characteristic of site $$r$$
• Local characteristics determine the joint distribution, but actually solving for the joint distribution is hard
• Gibbs-Markov equivalence gives a general result, but not always an easy thing to calculate with
• Estimation via
• Local conditional model (under homogeneity or multiple realizations)
• Likelihood (if you can calculate it)
• Likelihood-ish objective functions
• Simulation-based inference

# Backup: Gibbs-Markov Theorem

• $$X=$$ the whole random field, the vector of $$X(r)$$ for all $$r$$
• $$X$$ has a Gibbs distribution when $\Prob{X=x} = \frac{1}{Z}\exp{-\left(\sum_{A}{V_A(x)}\right)}$
• $$A =$$ subsets of sites
• $$V_A=$$ potential function, depending only on the state of sites in A
• $$Z \equiv \sum_{x}{exp{-\left(\sum_{A}{V_A(x)}\right)}} =$$ partition function
• A set of sites is a clique if every site in $$A$$ is a neighbor of every other site
• A potential is a nearest-neighbor potential if $$V_A = 0$$ whenever $$A$$ isn’t a clique

# Backup: Gibbs-Markov Theorem

• On a finite set of sites, if $$\Prob{X=x}> 0$$ for all configurations $$x$$, then $$X$$ is a Markov random field if and only if $$X$$ has a Gibbs distribution with a nearest-neighbor potential
• Gibbs $$\Rightarrow$$ Markov: direct calculation
• Markov $$\Rightarrow$$ Gibbs: complicated!
• Our readings from Guttorp sketch a proof due to Griffeath (1976)
• $$Z$$ is hard to calculate either way
• Sampling from Gibbs distributions:
• Use Gibbs sampler (since it’s Markov)
• Use MCMC, since that doesn’t need $$Z$$

# Backup: Gibbs-Markov Theorem

• Statisticians usually call this the “Hammersley-Clifford” theorem
• It was proved simultaneously and independent by H&C, Besag, Griffeath, Grimmett, …
• Priority is a mess, and Griffeath was one of my thesis advisers…
• “Gibbs-Markov” expresses the content better, and is equally slighting to everyone still alive

# Backup: A Conditional Likelihood for a Fraction of the Data

• On a square lattice, color the sites alternately black and white
• Each white site has only black neighbors
• White sites are independent given the black sites (by Markov property)
• Multiply the local characteristics of the white sites to get a conditional likelihood $\Prob{X(\text{white})=x(\text{white})|X(\text{black})=x(\text{black})} = \prod_{r \in \text{white}}{\Prob{X(r)=x(r)|X(\Neighbors(r)) = x(\Neighbors(r))}}$

• Bartlett (1975), pp. 27–28, calls this “coding”
• $$=$$ the “concliques” of Kaplan et al. (2018)
• Adapts to other geometries (see Bartlett again)
• Efficiency compared to pseudolikelihood is unclear

# References

Bartlett, M. S. 1975. The Statistical Analysis of Spatial Pattern. London: Chapman; Hall.

Griffeath, David. 1976. “Introduction to Markov Random Fields.” In Denumerable Markov Chains, edited by John G. Kemeny, J. Laurie Snell, and Anthony W. Knapp, Second, 425–57. Berlin: Springer-Verlag.

Kaplan, Andee, Mark S. Kaiser, Soumendra N. Lahiri, and Daniel J. Nordman. 2018. “Simulating Markov Random Fields with a Conclique-Based Gibbs Sampler.” arxiv.org:1808.04739. https://arxiv.org/abs/1808.04739.