• Reminder about block models
  • Stochastic block models
  • SBMs and community discovery
  • Continuous latent space models
  • Extensions and side-lights (time permitting)

Notation for today

  • \(m =\) total number of edges
  • \(k_i =\) degree of node \(i\) in undirected graph
    • \(\sum_{i}{k_i} = 2m\)

Block Models

\[ \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \DeclareMathOperator*{\logit}{logit} \DeclareMathOperator*{\Tr}{Tr} \]

  • \(n\) nodes, divided into \(k\) blocks, \(Z_i =\) block of node \(i\), \(k\times k\) affinity matrix \(\mathbf{b}\)

\[ \Prob{ A_{ij}=1| Z_i = r, Z_j = s } = b_{rs} \]

  • Independence across edges

  • Inference as easy as could be hoped

  • Presumes: block assignments are known

Stochastic Block Models (SBMs)

  • "SBM" means:

\[ \begin{eqnarray} Z_i & \sim_{IID} & \mathrm{Multinomial}(\rho)\\ A_{ij} | Z_i, Z_j & \sim_{ind} & \mathrm{Bernoulli}(b_{Z_i Z_j}) \end{eqnarray} \]

i.e., block assignment is stochastic (but IID)

The log-likelihood gets complicated

\[ \ell(b, \rho) = \log{\sum_{z \in \{1:k\}^n}{\left[\prod_{i=1}^{n}{\prod_{j=1}^{n}{b_{z_i z_j}^{A_{ij}} {(1-b_{z_i z_j})}^{(1-A_{ij})}}} \prod_{i=1}^{n}{\rho_{z_i}}\right]}} \]

Define \(n_r(z)\), \(e_{rs}(z)\), \(n_{rs}(z)\) in the obvious ways

\[ \ell(b, \rho) = \log{\sum_{z \in \{1:k\}^n}{\left[\prod_{r,s}{b_{rs}^{e_{rs}(z)} (1-b_{rs})^{n_{rs}(z) - e_{rs}(z)}} \prod_{r}{\rho_r^{n_r(z)}}\right]}} \]

and \(\log{\sum} \neq \sum{\log} \ldots\)

How do we get out of this mess?

If we knew \(Z\), estimating \(\mathbf{b}\) and \(\rho\) would be easy

If knew \(\mathbf{b}\) and \(\rho\), getting \(\Prob{Z|A}\) is at least conceivable

  • EM algorithm
  • EM algorithm with "belief propagation"
    • Node \(i\) takes in current guesses about blocks of its neighbors, \(\rho\)
    • Node \(i\) finds posterior distribution for \(Z_i\); iterate
    • Usually special handling of non-edges
  • Gibbs sampling
  • Treat \(Z\) as fixed parameter, maximize

And after all that…

  • SBM is not identified!
  • Swap any two of the block labels:
    • Exchange those rows and columns of \(\mathbf{b}\)
    • Also exchange those entries in \(\rho\)
    • Distribution over graphs is unchanged
  • Measure differences in \(Z\)s between estimates in permutation-invariant ways
    • e.g., min over permuting \(1:k\)
    • or use mutual information


  • Assortative mixing in networks = nodes with same value of discrete characteristic have more links than you'd expect
  • How many is that?

\[ \begin{eqnarray} \kappa_{rs} & \equiv & e_{rs}/2m\\ \kappa_{r} & \equiv & \sum_{s}{\kappa_{rs}}\\ Q & \equiv & \sum_{r}{\kappa_{rr} - \kappa_r^2}\\ \end{eqnarray} \]

  • Note: \(\Tr{\mathbf{\kappa}}\) maximized if all nodes are in one block!
  • Assortativity usually refers to observed characteristics

Modularity (cont'd)

  • We can use \(Q\) when \(z\) is something we make up:

\[ Q(z) = \sum_{r}{\kappa_{rr}(z) - \kappa_r(z)^2} \]

  • This is the (Newman-Girvan) modularity of the block-assignment vector \(z\)

  • Equivalent (exercise!) to a sum over node pairs:

\[ Q = \frac{1}{2m}\sum_{i,j}{\left[A_{ij} - \frac{k_i k_j}{2m}\right]\delta_{Z_i Z_j}} \]

  • Break this down:
    • \(k_i k_j / 2m =\) probability of an \((i,j)\) edge if nodes are paired randomly but degrees are preserved
    • \(A_{ij} - k_i k_j/2m > 0\) for \(A_{ij} = 1\), \(<0\) for \(A_{ij} = 0\)
    • \(Q\) likes within-block edges, dislikes within-block non-edges
    • Substitute other null models to taste
    • Substitute divergences other than \(-\) to taste

Community Discovery

  • Community or module: group of nodes with dense internal connections, but few connections to other communities
  • Community discovery: given a graph, divide it into good communities
  • "Good" often means: high modularity \(Q\)
  • Huge literature since Newman and Girvan 2003

Community Discovery (cont'd.)

  • General maximization problem is NP
  • Many, many heuristics
    • Find highest betweenness edge, remove, recalculate betweenness afterwards
    • Turn into an eigen-problem
    • Assign random initial communities, take majority vote (like HW 1 Prob 3)
    • Find most likely \(Z\) in an SBM
    • Many of these built in to igraph

Consistency of Community Discovery

  • Theoretical literature has focused on a very strong form of consistency: as \(n\rightarrow \infty\), \[ \Prob{\hat{Z} \neq Z} \rightarrow 0 \] i.e., probability that all nodes are correctly assigned to communities goes to 1
    • Could instead imagine something like "proportion of mis-assigned nodes goes to zero in probability"
    • Permuting over community labels always allowed
  • Growing theoretical literature, typically assuming:
    • Graph really comes from SBM
    • Expected degree grows sufficiently rapidly with \(n\)
    • \(\mathbf{b}\) is diagonally dominated
    • Columns of \(\mathbf{b}\) are sufficiently different from each other

Continuous Latent Space Models

The classic approach, due to Hoff, Raftery and Handcock:

  • Node \(i\) lives at a (latent) point \(Z_i \in \mathbb{R}^d\)
    • HRH proposed these are IID \(\sim \mathcal{N}(0, \mathbf{I}_d)\)
  • Edges become unlikely as nodes separate
    • HRH proposed \(\logit{\Prob{ A_{ij}=1|Z_i, Z_j}} = \beta_0 - \| Z_i - Z_j\|\)
  • All \(A_{ij}\) are dependent
  • All \(A_{ij}\) are independent given locations

Symmetry again

  • Why just \(\beta_0 - \| Z_i - Z_j \|\)? Why not \(\beta_0 - \beta_1 \| Z_i - Z_j\|\)?
  • Why \(Z_i \sim \mathcal{N}(0, \mathbf{I}_d)\), instead of some other variance?
    • If we multiply all the \(Z_i\) by the same scalar \(r\), and \(\beta_1\) by \(1/r\), nothing observable changes
    • Thus fix \(\beta_1 = 1\), and prior variance at unity
  • The \(Z_i\) are still not identified:
    • Nothing changes if rotate all the \(Z_i\) the same way
    • Or if translate all the \(Z_i\) along the same vector
    • Or if we reflect all the \(Z_i\) about the same plane
    • Or combine rotations, translations and reflections


  • Isometry = translations which leave all distances (metric) the same (iso-)
    • For Euclidean space, isometry group built from rotations, translations and reflections
    • The \(Z\)s are "identified up to isometry"
  • Procrustes problem = given two sets of points in \(\mathbb{R}^d\), find isometry which minimizes the distance between them
    • Good algorithms for this (especially if not too many points and \(d\) small)
    • Often useful as an intermediate stage in working with continuous-space models

What to do with continuous-space models?

  • Embedding: given \(A_{ij}\), guess at \(Z\)
  • Inference: on \(\beta_0\) and/or posterior distribution of \(Z\)
  • Of course, easy to simulate


  • Add in node covariates
  • Other distributions for locations
    • Isometry: set mean at 0, variance at \(\mathbf{I}_d\) w.o.lo.g.
    • Why think anything is Gaussian?
  • Other link-probability functions
    • Why think anything is logit-linear?
    • Zero outside maximum radius?
    • Step-function ("Heaviside") link probabilities?
  • Motion over time (Moore and Sarkar)
  • Other latent spaces
    • Smooth manifolds
    • Positively-curved (spherical) spaces
    • Negatively-curved (hyperbolic) spaces

The cycle