36-467/36-667

\[ \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \newcommand{\Neighbors}{\mathcal{N}} \]

- Markov processes in time: \(\Prob{X(t+1)|X(0), X(1), \ldots X(t)} = \Prob{X(t+1)|X(t)}\)
- Markov chains: \(X(t)\) is discrete-valued

- Asymptotic behavior of Markov chains
- Likelihood-based inference for Markov models
- What about space? What about space and time?

- \(X(r) =\) value of the random field at site \(r\)
- \(X\) or \(X(\cdot) =\) value of the field at all sites, the
**configuration** - \(X(-r) =\) value of the random field everywhere other than \(r\)
- \(X(\Neighbors(r)) =\) value of the random field at the neighbors of \(r\)
- the
**neighborhood configuration**

- the
- \(X\) is a
**Markov random field**when \[ \Prob{X(r)|X(-r)} = \Prob{X(r)|X(\Neighbors(r))} \] - \(p_r(x, y) \equiv \Prob{X(r)=x|X(\Neighbors(r))=y}\) is the
**local characteristic**for site \(r\)- People also write \(q_{r}(x,y)\)
- Can vary with parameters, \(q_{r}(x,y;\theta)\)

- Working out joint \(\Prob{X}\) from \(\Prob{X(r)|X(\Neighbors(r))}\) is not easy!
- No natural order to use in factorizing

- Approach I: Heroic mathematics
- Truly heroic in some cases…

- Approach II: Gibbs-Markov equivalence
- Upshot: For the right functions \(V_r\), \(V_{rq}\), \[ \log{\Prob{X=x}} = \text{(constant)} + \sum_{r}{V_r(x(r))} + \sum_{r}{\sum_{q\in\Neighbors(r)}{V_{rq}(x(r), x(q))}} \]
- Getting the constant is hard
- See backup

- Approach III: Gibbs sampler
- Same Gibbs, different idea

- Assumes space \(r\) is discrete, but state \(X(r)\) can be continuous
- Start with
*some*initial value of \(X(r)\) for all \(r\) - Fix an order on the sites \(r\), and then:
- \(X(r) \sim \Prob{X(r)|X(\Neighbors(r)) = X(r)}\)
- \(=\) draw a new value for \(X(r)\) from the conditional distribution / local characteristic and replace the old value of \(X(r)\)
- N.B., later sites get conditioned on the updated value for \(X(r)\)

- Sweep through all sites at least once, and generally a multiple times
- Each sweep is one step of a big Markov chain \(\Rightarrow\) converges on an invariant
*joint*distribution - See Kaplan et al. (2018) for a clever idea for speeding this up by simultaneous updating

- If we have multiple, independent copies of the field
- Can find \(p_r(x,y)\) for each \(r\)
- Use conditional density estimation or even regression

- If \(p_r(x,y)\) is the same for all \(r\)
- Could do conditional density estimation or even regression
- Usual error statistics aren’t valid because of dependence

- Can also do simulation-based inference
- Conditional auto-regressions make good auxiliaries for indirect inference

- Full likelihood inference is hard
- Need the joint distribution \(\Prob{X=x}\)
- Often need to do Monte Carlo just to get the likelihood

**Pseudolikelihood**: Product of all the local characteristics \(p_r\) \[ \prod_{r}{\Prob{X(r)=x(r)|X(\Neighbors(r))=x(\Neighbors(r))}} \]- \(\neq\) the joint probability
- Also called a
**composite likelihood** - Generally, consistent, but not as statistically efficient as likelihood

- If you can get the likelihood, use the Hessian of the log-likelihood
- Simulation-based methods work as usual
- Parametric bootstrap works as usual
- Nonparametric bootstrap for stationary fields
- Use rectangular blocks

- Now have \(X(r,t)\)
- Write \(X(\cdot, t)\) for the whole configuration at time \(t\)

- Two choices for Markov property
- Option I: \(X(r,t)\) should be conditioned on \(X(\Neighbors(r), t-1)\) and \(X(r,t-1)\)
- So \(X(r,t)\) and \(X(q,t)\) are independent given \(X(\cdot, t-1)\)

- Option II: Condition on \(X(\Neighbors(r), t-1)\), \(X(r,t-1)\)
*and*\(X(\Neighbors(r), t)\)- Neighbors at time \(t\) are still dependent given \(X(\cdot, t-1)\)

- Sometimes called
**recursive**vs.**simultaneous**

- Option I (recursive) fields are
*much easier*- \(\Prob{X(\cdot, t)|X(\cdot, t-1)} = \prod_{r}{\Prob{X(r,t)|X(\Neighbors(r), t-1), X(r,t-1)}}\)
- Can actually calculate the likelihood
- Can estimate \(\Prob{X(r,t)|X(\Neighbors(r), t-1), X(r,t-1)}\) the same way as in a Markov chain

- Option II (simultaneous) fields are not as nice
- Basically, back to spatial models with an extra coordinate
- Composite likelihood often the best bet

- In a (spatial) Markov random field, \(X(r)\) is “screened off” from the rest of the field by its neighbors
- Conditional distribution of \(X(r)\) given neighbors = local characteristic of site \(r\)

- Local characteristics determine the joint distribution, but actually
*solving*for the joint distribution is hard- Gibbs-Markov equivalence gives a general result, but not always an easy thing to calculate with

- Estimation via
- Local conditional model (under homogeneity or multiple realizations)
- Likelihood (if you can calculate it)
- Likelihood-ish objective functions
- Simulation-based inference

- \(X=\) the whole random field, the vector of \(X(r)\) for all \(r\)
- \(X\) has a
**Gibbs distribution**when \[ \Prob{X=x} = \frac{1}{Z}\exp{-\left(\sum_{A}{V_A(x)}\right)} \]- \(A =\) subsets of sites
- \(V_A=\)
**potential function**, depending only on the state of sites in A - \(Z \equiv \sum_{x}{exp{-\left(\sum_{A}{V_A(x)}\right)}} =\)
**partition function**

- A set of sites is a
**clique**if every site in \(A\) is a neighbor of every other site - A potential is a
**nearest-neighbor potential**if \(V_A = 0\) whenever \(A\) isn’t a clique

- On a finite set of sites, if \(\Prob{X=x}> 0\) for all configurations \(x\), then \(X\) is a Markov random field if and only if \(X\) has a Gibbs distribution with a nearest-neighbor potential
- Gibbs \(\Rightarrow\) Markov: direct calculation
- Markov \(\Rightarrow\) Gibbs: complicated!
- Our readings from Guttorp sketch a proof due to Griffeath (1976)

- \(Z\) is hard to calculate either way
- Sampling from Gibbs distributions:
- Use Gibbs sampler (since it’s Markov)
- Use MCMC, since that doesn’t need \(Z\)

- Statisticians usually call this the “Hammersley-Clifford” theorem
- It was proved simultaneously and independent by H&C, Besag, Griffeath, Grimmett, …
- Priority is a mess, and Griffeath was one of my thesis advisers…
- “Gibbs-Markov” expresses the content better, and is equally slighting to everyone still alive

- On a square lattice, color the sites alternately black and white
- Each white site has only black neighbors
- White sites are independent given the black sites (by Markov property)
Multiply the local characteristics of the white sites to get a conditional likelihood \[ \Prob{X(\text{white})=x(\text{white})|X(\text{black})=x(\text{black})} = \prod_{r \in \text{white}}{\Prob{X(r)=x(r)|X(\Neighbors(r)) = x(\Neighbors(r))}} \]

- Bartlett (1975), pp. 27–28, calls this “coding”
- \(=\) the “concliques” of Kaplan et al. (2018)
- Adapts to other geometries (see Bartlett again)
- Efficiency compared to pseudolikelihood is unclear

Bartlett, M. S. 1975. *The Statistical Analysis of Spatial Pattern*. London: Chapman; Hall.

Griffeath, David. 1976. “Introduction to Markov Random Fields.” In *Denumerable Markov Chains*, edited by John G. Kemeny, J. Laurie Snell, and Anthony W. Knapp, Second, 425–57. Berlin: Springer-Verlag.

Kaplan, Andee, Mark S. Kaiser, Soumendra N. Lahiri, and Daniel J. Nordman. 2018. “Simulating Markov Random Fields with a Conclique-Based Gibbs Sampler.” arxiv.org:1808.04739. https://arxiv.org/abs/1808.04739.