Agenda

• Stochastic block models
• SBMs and community discovery
• Continuous latent space models
• Extensions and side-lights (time permitting)

Notation for today

• $$m =$$ total number of edges
• $$k_i =$$ degree of node $$i$$ in undirected graph
• $$\sum_{i}{k_i} = 2m$$

Block Models

$\newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \DeclareMathOperator*{\logit}{logit} \DeclareMathOperator*{\Tr}{Tr}$

• $$n$$ nodes, divided into $$k$$ blocks, $$Z_i =$$ block of node $$i$$, $$k\times k$$ affinity matrix $$\mathbf{b}$$

$\Prob{ A_{ij}=1| Z_i = r, Z_j = s } = b_{rs}$

• Independence across edges

• Inference as easy as could be hoped

• Presumes: block assignments are known

Stochastic Block Models (SBMs)

• "SBM" means:

$\begin{eqnarray} Z_i & \sim_{IID} & \mathrm{Multinomial}(\rho)\\ A_{ij} | Z_i, Z_j & \sim_{ind} & \mathrm{Bernoulli}(b_{Z_i Z_j}) \end{eqnarray}$

i.e., block assignment is stochastic (but IID)

The log-likelihood gets complicated

$\ell(b, \rho) = \log{\sum_{z \in \{1:k\}^n}{\left[\prod_{i=1}^{n}{\prod_{j=1}^{n}{b_{z_i z_j}^{A_{ij}} {(1-b_{z_i z_j})}^{(1-A_{ij})}}} \prod_{i=1}^{n}{\rho_{z_i}}\right]}}$

Define $$n_r(z)$$, $$e_{rs}(z)$$, $$n_{rs}(z)$$ in the obvious ways

$\ell(b, \rho) = \log{\sum_{z \in \{1:k\}^n}{\left[\prod_{r,s}{b_{rs}^{e_{rs}(z)} (1-b_{rs})^{n_{rs}(z) - e_{rs}(z)}} \prod_{r}{\rho_r^{n_r(z)}}\right]}}$

and $$\log{\sum} \neq \sum{\log} \ldots$$

How do we get out of this mess?

If we knew $$Z$$, estimating $$\mathbf{b}$$ and $$\rho$$ would be easy

If knew $$\mathbf{b}$$ and $$\rho$$, getting $$\Prob{Z|A}$$ is at least conceivable

• EM algorithm
• EM algorithm with "belief propagation"
• Node $$i$$ takes in current guesses about blocks of its neighbors, $$\rho$$
• Node $$i$$ finds posterior distribution for $$Z_i$$; iterate
• Usually special handling of non-edges
• Gibbs sampling
• Treat $$Z$$ as fixed parameter, maximize

And after all that…

• SBM is not identified!
• Swap any two of the block labels:
• Exchange those rows and columns of $$\mathbf{b}$$
• Also exchange those entries in $$\rho$$
• Distribution over graphs is unchanged
• Measure differences in $$Z$$s between estimates in permutation-invariant ways
• e.g., min over permuting $$1:k$$
• or use mutual information

Modularity

• Assortative mixing in networks = nodes with same value of discrete characteristic have more links than you'd expect
• How many is that?

$\begin{eqnarray} \kappa_{rs} & \equiv & e_{rs}/2m\\ \kappa_{r} & \equiv & \sum_{s}{\kappa_{rs}}\\ Q & \equiv & \sum_{r}{\kappa_{rr} - \kappa_r^2}\\ \end{eqnarray}$

• Note: $$\Tr{\mathbf{\kappa}}$$ maximized if all nodes are in one block!
• Assortativity usually refers to observed characteristics

Modularity (cont'd)

• We can use $$Q$$ when $$z$$ is something we make up:

$Q(z) = \sum_{r}{\kappa_{rr}(z) - \kappa_r(z)^2}$

• This is the (Newman-Girvan) modularity of the block-assignment vector $$z$$
• Equivalent (exercise!) to a sum over node pairs:

$Q = \frac{1}{2m}\sum_{i,j}{\left[A_{ij} - \frac{k_i k_j}{2m}\right]\delta_{Z_i Z_j}}$

• Break this down:
• $$k_i k_j / 2m =$$ probability of an $$(i,j)$$ edge if nodes are paired randomly but degrees are preserved
• $$A_{ij} - k_i k_j/2m > 0$$ for $$A_{ij} = 1$$, $$<0$$ for $$A_{ij} = 0$$
• $$Q$$ likes within-block edges, dislikes within-block non-edges
• Substitute other null models to taste
• Substitute divergences other than $$-$$ to taste

Community Discovery

• Community or module: group of nodes with dense internal connections, but few connections to other communities
• Community discovery: given a graph, divide it into good communities
• "Good" often means: high modularity $$Q$$
• Huge literature since Newman and Girvan 2003

Community Discovery (cont'd.)

• General maximization problem is NP
• Many, many heuristics
• Find highest betweenness edge, remove, recalculate betweenness afterwards
• Turn into an eigen-problem
• Assign random initial communities, take majority vote (like HW 1 Prob 3)
• Find most likely $$Z$$ in an SBM
• Many of these built in to igraph

Consistency of Community Discovery

• Theoretical literature has focused on a very strong form of consistency: as $$n\rightarrow \infty$$, $\Prob{\hat{Z} \neq Z} \rightarrow 0$ i.e., probability that all nodes are correctly assigned to communities goes to 1
• Could instead imagine something like "proportion of mis-assigned nodes goes to zero in probability"
• Permuting over community labels always allowed
• Growing theoretical literature, typically assuming:
• Graph really comes from SBM
• Expected degree grows sufficiently rapidly with $$n$$
• $$\mathbf{b}$$ is diagonally dominated
• Columns of $$\mathbf{b}$$ are sufficiently different from each other

Continuous Latent Space Models

The classic approach, due to Hoff, Raftery and Handcock:

• Node $$i$$ lives at a (latent) point $$Z_i \in \mathbb{R}^d$$
• HRH proposed these are IID $$\sim \mathcal{N}(0, \mathbf{I}_d)$$
• Edges become unlikely as nodes separate
• HRH proposed $$\logit{\Prob{ A_{ij}=1|Z_i, Z_j}} = \beta_0 - \| Z_i - Z_j\|$$
• All $$A_{ij}$$ are dependent
• All $$A_{ij}$$ are independent given locations

Symmetry again

• Why just $$\beta_0 - \| Z_i - Z_j \|$$? Why not $$\beta_0 - \beta_1 \| Z_i - Z_j\|$$?
• Why $$Z_i \sim \mathcal{N}(0, \mathbf{I}_d)$$, instead of some other variance?
• If we multiply all the $$Z_i$$ by the same scalar $$r$$, and $$\beta_1$$ by $$1/r$$, nothing observable changes
• Thus fix $$\beta_1 = 1$$, and prior variance at unity
• The $$Z_i$$ are still not identified:
• Nothing changes if rotate all the $$Z_i$$ the same way
• Or if translate all the $$Z_i$$ along the same vector
• Or if we reflect all the $$Z_i$$ about the same plane
• Or combine rotations, translations and reflections

Isometry

• Isometry = translations which leave all distances (metric) the same (iso-)
• For Euclidean space, isometry group built from rotations, translations and reflections
• The $$Z$$s are "identified up to isometry"
• Procrustes problem = given two sets of points in $$\mathbb{R}^d$$, find isometry which minimizes the distance between them
• Good algorithms for this (especially if not too many points and $$d$$ small)
• Often useful as an intermediate stage in working with continuous-space models

What to do with continuous-space models?

• Embedding: given $$A_{ij}$$, guess at $$Z$$
• Inference: on $$\beta_0$$ and/or posterior distribution of $$Z$$
• Of course, easy to simulate

Variants

• Other distributions for locations
• Isometry: set mean at 0, variance at $$\mathbf{I}_d$$ w.o.lo.g.
• Why think anything is Gaussian?