---
title: Lecture 7 --- Stochastic Block Models and Continuous Latent Space Models
output: ioslides_presentation
---
## Agenda
- Reminder about block models
- Stochastic block models
- SBMs and community discovery
- Continuous latent space models
- Extensions and side-lights (time permitting)
## Notation for today
- $m =$ total number of edges
- $k_i =$ degree of node $i$ in undirected graph
+ $\sum_{i}{k_i} = 2m$
## Block Models
\[
\newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)}
\DeclareMathOperator*{\logit}{logit}
\DeclareMathOperator*{\Tr}{Tr}
\]
- $n$ nodes, divided into $k$ **blocks**, $Z_i =$ block of node $i$, $k\times k$ **affinity matrix** $\mathbf{b}$
\[
\Prob{ A_{ij}=1| Z_i = r, Z_j = s } = b_{rs}
\]
- Independence across edges
- Inference as easy as could be hoped
- Presumes: block assignments are known
## Stochastic Block Models (SBMs)
- "SBM" means:
\[
\begin{eqnarray}
Z_i & \sim_{IID} & \mathrm{Multinomial}(\rho)\\
A_{ij} | Z_i, Z_j & \sim_{ind} & \mathrm{Bernoulli}(b_{Z_i Z_j})
\end{eqnarray}
\]
i.e., block assignment is stochastic (but IID)
## The log-likelihood gets complicated
\[
\ell(b, \rho) = \log{\sum_{z \in \{1:k\}^n}{\left[\prod_{i=1}^{n}{\prod_{j=1}^{n}{b_{z_i z_j}^{A_{ij}} {(1-b_{z_i z_j})}^{(1-A_{ij})}}} \prod_{i=1}^{n}{\rho_{z_i}}\right]}}
\]
Define $n_r(z)$, $e_{rs}(z)$, $n_{rs}(z)$ in the obvious ways
\[
\ell(b, \rho) = \log{\sum_{z \in \{1:k\}^n}{\left[\prod_{r,s}{b_{rs}^{e_{rs}(z)} (1-b_{rs})^{n_{rs}(z) - e_{rs}(z)}} \prod_{r}{\rho_r^{n_r(z)}}\right]}}
\]
and $\log{\sum} \neq \sum{\log} \ldots$
## How do we get out of this mess? {.smaller}
If we knew $Z$, estimating $\mathbf{b}$ and $\rho$ would be easy
If knew $\mathbf{b}$ and $\rho$, getting $\Prob{Z|A}$ is at least conceivable
- EM algorithm
- EM algorithm with "belief propagation" (Decelle et al.)
+ Node $i$ takes in current guesses about blocks of its neighbors, $\rho$
+ Node $i$ finds posterior distribution for $Z_i$; iterate
+ Usually special handling of non-edges
- Gibbs sampling
- Treat $Z$ as fixed parameter, maximize
## And after all that...
- SBM is not identified!
- Swap any two of the block labels:
+ Exchange those rows and columns of $\mathbf{b}$
+ Also exchange those entries in $\rho$
+ Distribution over _graphs_ is unchanged
- Measure differences in $Z$s between estimates in permutation-invariant ways
+ e.g., min over permuting $1:k$
+ or use mutual information
## Modularity
- **Assortative mixing** in networks = nodes with same value of discrete characteristic have more links than you'd expect
- How many is that?
\[
\begin{eqnarray}
\kappa_{rs} & \equiv & e_{rs}/2m\\
\kappa_{r} & \equiv & \sum_{s}{\kappa_{rs}}\\
Q & \equiv & \sum_{r}{\kappa_{rr} - \kappa_r^2}\\
\end{eqnarray}
\]
- Note: $\Tr{\mathbf{\kappa}}$ maximized if _all_ nodes are in one block!
- Assortativity usually refers to _observed_ characteristics
## Modularity (cont'd)
- We can use $Q$ when $z$ is something we make up:
\[
Q(z) = \sum_{r}{\kappa_{rr}(z) - \kappa_r(z)^2}
\]
- This is the **(Newman-Girvan) modularity** of the block-assignment vector $z$
---
- Equivalent (exercise!) to a sum over node pairs:
\[
Q = \frac{1}{2m}\sum_{i,j}{\left[A_{ij} - \frac{k_i k_j}{2m}\right]\delta_{Z_i Z_j}}
\]
- Break this down:
+ $k_i k_j / 2m =$ probability of an $(i,j)$ edge if nodes are paired randomly but degrees are preserved
+ $A_{ij} - k_i k_j/2m > 0$ for $A_{ij} = 1$, $<0$ for $A_{ij} = 0$
+ $Q$ likes within-block edges, dislikes within-block non-edges
+ Substitute other null models to taste
+ Substitute divergences other than $-$ to taste
## Community Discovery
- **Community** or **module**: group of nodes with dense internal connections, but
few connections to other communities
- **Community discovery**: given a graph, divide it into good communities
- "Good" often means: high modularity $Q$
- Huge literature since Newman and Girvan 2003
## Community Discovery (cont'd.)
- General maximization problem is NP
- Many, many heuristics
+ Find highest betweenness _edge_, remove, recalculate betweenness afterwards
+ Turn into an eigen-problem
+ Assign random initial communities, take majority vote (like HW 1 Prob 3)
+ Find most likely $Z$ in an SBM
+ Many of these built in to igraph
## Consistency of Community Discovery {.smaller}
- Theoretical literature has focused on a very strong form of consistency: as $n\rightarrow \infty$,
\[
\Prob{\hat{Z} \neq Z} \rightarrow 0
\]
i.e., probability that _all_ nodes are correctly assigned to communities goes to 1
+ Could instead imagine something like "_proportion_ of mis-assigned nodes goes to zero in probability"
+ Permuting over community labels always allowed
- Growing theoretical literature, typically assuming:
+ Graph really comes from SBM
+ Expected degree grows sufficiently rapidly with $n$
+ $\mathbf{b}$ is diagonally dominated
+ Columns of $\mathbf{b}$ are sufficiently different from each other
## Continuous Latent Space Models
The classic approach, due to Hoff, Raftery and Handcock:
- Node $i$ lives at a (latent) point $Z_i \in \mathbb{R}^d$
+ HRH proposed these are IID $\sim \mathcal{N}(0, \mathbf{I}_d)$
- Edges become unlikely as nodes separate
+ HRH proposed $\logit{\Prob{ A_{ij}=1|Z_i, Z_j}} = \beta_0 - \| Z_i - Z_j\|$
- All $A_{ij}$ are _dependent_
- All $A_{ij}$ are _independent_ given locations
## Symmetry again
- Why just $\beta_0 - \| Z_i - Z_j \|$? Why not $\beta_0 - \beta_1 \| Z_i - Z_j\|$?
- Why $Z_i \sim \mathcal{N}(0, \mathbf{I}_d)$, instead of some other variance?
+ If we multiply all the $Z_i$ by the same scalar $r$, and $\beta_1$ by $1/r$, nothing _observable_ changes
+ Thus fix $\beta_1 = 1$, and prior variance at unity
- The $Z_i$ are _still_ not identified:
+ Nothing changes if rotate all the $Z_i$ the same way
+ Or if translate all the $Z_i$ along the same vector
+ Or if we reflect all the $Z_i$ about the same plane
+ Or combine rotations, translations and reflections
## Isometry
- **Isometry** = transformations which leave all distances (_metric_) the same (_iso-_)
+ For Euclidean space, **isometry group** built from rotations, translations and reflections
+ The $Z$s are "identified up to isometry"
- **Procrustes problem** = given two sets of points in $\mathbb{R}^d$, find isometry which minimizes the distance between them
+ Good algorithms for this (especially if not too many points and $d$ small)
+ Often useful as an intermediate stage in working with continuous-space models
## What to do with continuous-space models?
- _Embedding_: given $A_{ij}$, guess at $Z$
- _Inference_: on $\beta_0$ and/or posterior distribution of $Z$
- Of course, easy to simulate
## Variants {.smaller}
- Add in node covariates
- Other distributions for locations
+ Isometry: set mean at 0, variance at $\mathbf{I}_d$ w.o.lo.g.
+ Why think _anything_ is Gaussian?
- Other link-probability functions
+ Why think _anything_ is logit-linear?
+ Zero outside maximum radius?
+ Step-function ("Heaviside") link probabilities?
- Motion over time (Moore and Sarkar)
- Other latent spaces
+ Smooth manifolds
+ Positively-curved (spherical) spaces
+ Negatively-curved (hyperbolic) spaces
## The cycle
![](dena_graph.pdf) $\Rightarrow$ ![](dena_embedding.pdf) $\Rightarrow$ ![](dena_density.pdf)
(D. Asta)
## Hyperbolic spaces
Lots of real networks are tree-like; this leads to non-Euclidean, hyperbolic spaces
- Hierarchical, tree-like structures embed isometrically into hyperbolic spaces
- The origin is like the root of the tree
- Volume within $r$ of the origin grows _exponentially_ with $r$
- Shortest paths between points far from the origin curve back towards the origin
## Geodesic paths in the hyperbolic disk
```{r, fig.retina=NULL, out.width=350, echo=FALSE}
knitr::include_graphics("dena_hyperbolic.pdf")
```
(D. Asta)
## Geodesic paths in the hyperbolic disk
```{r, fig.retina=NULL, out.width=350, echo=FALSE}
knitr::include_graphics("circle-limit-iii.jpg")
```
(M. C. Escher)
---
Use a hyperbolic space, with link probabilities decaying in distance,
and (Krioukov et al. 2005):
- Highly skewed degree distribution (higher degrees closer to origin)
- Lots of clustering
- Core-periphery structure
## Inference
- Lots of algorithmic work on embeddings that maximize particular likelihoods, minimize some distortion, etc.
- HRH and related: MCMC for the posterior distribution of $Z$
+ Consistency: who knows?
- First proof that MLE is consistent: Shalizi \& Asta forthcoming
+ General metric spaces with not-too-complex isometry groups
+ Presumes smooth, known link function
+ No assumption on distribution of $Z$
## The general picture
- Each node gets an IID latent variable $Z_i$
- $\Prob{A_{ij} = 1|Z_i=u, Z_j=v} = w(u,v)$
- Edges are independent given $Z$
- It turns out _all_ exchangeable graph models take this form
+ For details, take 781 in mini-2
# Time permitting...
## Some physics jargon {.smaller}
- Analogy to magnetism; $Z_i$ = "spin" of atom or molecule $i$
- Nearby spins interact; all spins coupled to external magnetic fields
- Energy ("Hamiltonian") of the state $z$ has the form
\[
h(z) = \sum_{i,j}{c_{ij}(z_i, z_j)} + \sum_{i}{r(z_i, \rho)}
\]
- $\Prob{z} \propto e^{-\beta h(z)}$, with $\beta=$ inverse temperature
+ "Boltzmann distribution", "canonical ensemble" (= exponential family)
+ Low temperature = low-energy states strongly preferred
+ High temperature = all states tend towards being equally probable
- Lowest-energy state = **ground state** = state of maximum likelihood
- **free** energy = energy that could be extracted, above thermal noise $=\log{\sum_{z}{e^{-\beta h(z)}}}$
## Rarer approach to SBM inference
- Prior distributions over $\mathbf{b}$, $\rho$ and MCMC
+ Priors are devices for smoothing, i.e., adding bias and reducing variance
+ What might be good biases to have here? How would you know?
- Simulation-based inference
+ Simulate many networks from each candidate $\mathbf{b}, \rho$
+ Compute summary statistics on simulations
+ Adjust parameters to match observed graph
## SBM Variant I: Degree-Corrected SBM
- Each node gets a **popularity** $\theta_i$
- Then edge probabilities follow
\[
\Prob{A_{ij} = 1 | Z_i=r, Z_j=s, \theta_i, \theta_j} = \theta_i \theta_j b_{rs}
\]
+ $\theta$ helps _account_ for broad degree distributions
- $\theta$ does nothing to _explain_ those degree distributions
## Degree-Corrected SBMs (cont'd.)
- Math simplifies if we pretend $A_{ij} \sim \mathrm{Poisson}(\theta_i \theta_j b_{Z_i Z_j})$
+ Little difference in distribution when means are $\ll 1$
- Symmetry under "dilation"
+ $b_{rs} \mapsto c b_{rs}$, $\theta_i \mapsto \theta_i/c$ for all $i: Z_i = r$ changes nothing
+ $\therefore$ impose one linear constraint on $\theta_i$ per block
- Fix $\sum_{i: Z_i = r}{\theta_i} = 1$, then $\hat{\theta} = k_i/\sum_{j: Z_j = r}{k_j}$
- High-dimensional problem: the dimension of $\theta$ grows with $n$!
+ OK in a dense graph, with $O(n)$ d.o.f. per $\theta_i$
+ Standard asymptotics break down for sparse graphs
## SBM Variant II: Mixed-Membership SBMs
- Nodes don't have fixed-but-random $Z_i$ any more
- Instead, each node has a distribution $\rho_i$ over $1:k$
- When pairing with node $j$, node $i$ draws $Z_{i(j)}$ from $\rho_i$
+ Similarly node $j$ draws $Z_{j(i)}$ independently from $\rho_j$
- Then look up edge probability from $b_{Z_{i(j)} Z_{j(i)}}$
## Mixed-Membership SBMs (cont'd.)
- Origin myth:
+ Blocks = social roles
+ $\rho_i =$ distribution of $i$'s social life over different roles
+ $Z_{i(j)} =$ role $i$ takes on when meeting $j$
- Myth is _random switching_, not _gradual transitions_
- OTOH, marginalize over $Z_{i(j)}, Z_{j(i)}$:
\[
\Prob{A_{ij}=1|\rho_i, \rho_j} = \sum_{r,s}{b_{rs} \rho_{ir} \rho_{js}}
\]
- Degree-corrected MMSBM left as exercise
## Force-Directed Layout {.smaller}
- **Force-directed layout** is a classic way to draw graphs:
+ Each node is represented by a point in space
+ Attractive forces between nodes with edges
+ Repulsive forces between nodes without edges
+ Run until equilibrium $\equiv$ minimize total energy
- Look at modularity by node pairs again:
\[
Q = \frac{1}{2m}\sum_{i,j}{\left[A_{ij} - \frac{k_i k_j}{2m}\right]\delta_{Z_i Z_j}}
\]
- This is an energy:
+ "Attraction" between nodes with edges (if in same block)
+ "Repulsion" between nodes without edges (if in same block)
+ Modularity is actually a special case of the usual energy function for force-directed layout (Noack, 2009)
## Going beyond classic continuous-latent-space models {.smaller}
- Add covariates, etc., etc.
- CLS conflates _stochastic equivalence_ and _homophily_
+ **Homophily** = preference for friends who are like you
+ **Stochastic equivalence** = two nodes have the same link probabilities
+ Diagonally-dominated SBMs also conflate these
- Hoff (2007) introduces a more general model: replace $-\| Z_i - Z_j \|$ with $Z_i^T \mathbf{\Lambda} Z_j$ for some _diagonal_ matrix $\mathbf{\Lambda}$
+ Allows attraction on some dimensions but repulsion on others
+ Allows for stochastic equivalence without homophily
+ General SBM a special case