- Reminder about block models
- Stochastic block models
- SBMs and community discovery
- Continuous latent space models
- Extensions and side-lights (time permitting)

- Reminder about block models
- Stochastic block models
- SBMs and community discovery
- Continuous latent space models
- Extensions and side-lights (time permitting)

- \(m =\) total number of edges
- \(k_i =\) degree of node \(i\) in undirected graph
- \(\sum_{i}{k_i} = 2m\)

\[ \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \DeclareMathOperator*{\logit}{logit} \DeclareMathOperator*{\Tr}{Tr} \]

- \(n\) nodes, divided into \(k\)
**blocks**, \(Z_i =\) block of node \(i\), \(k\times k\)**affinity matrix**\(\mathbf{b}\)

\[ \Prob{ A_{ij}=1| Z_i = r, Z_j = s } = b_{rs} \]

Independence across edges

Inference as easy as could be hoped

Presumes: block assignments are known

- "SBM" means:

\[ \begin{eqnarray} Z_i & \sim_{IID} & \mathrm{Multinomial}(\rho)\\ A_{ij} | Z_i, Z_j & \sim_{ind} & \mathrm{Bernoulli}(b_{Z_i Z_j}) \end{eqnarray} \]

i.e., block assignment is stochastic (but IID)

\[ \ell(b, \rho) = \log{\sum_{z \in \{1:k\}^n}{\left[\prod_{i=1}^{n}{\prod_{j=1}^{n}{b_{z_i z_j}^{A_{ij}} {(1-b_{z_i z_j})}^{(1-A_{ij})}}} \prod_{i=1}^{n}{\rho_{z_i}}\right]}} \]

Define \(n_r(z)\), \(e_{rs}(z)\), \(n_{rs}(z)\) in the obvious ways

\[ \ell(b, \rho) = \log{\sum_{z \in \{1:k\}^n}{\left[\prod_{r,s}{b_{rs}^{e_{rs}(z)} (1-b_{rs})^{n_{rs}(z) - e_{rs}(z)}} \prod_{r}{\rho_r^{n_r(z)}}\right]}} \]

and \(\log{\sum} \neq \sum{\log} \ldots\)

If we knew \(Z\), estimating \(\mathbf{b}\) and \(\rho\) would be easy

If knew \(\mathbf{b}\) and \(\rho\), getting \(\Prob{Z|A}\) is at least conceivable

- EM algorithm
- EM algorithm with "belief propagation"
- Node \(i\) takes in current guesses about blocks of its neighbors, \(\rho\)
- Node \(i\) finds posterior distribution for \(Z_i\); iterate
- Usually special handling of non-edges

- Gibbs sampling
- Treat \(Z\) as fixed parameter, maximize

- SBM is not identified!
- Swap any two of the block labels:
- Exchange those rows and columns of \(\mathbf{b}\)
- Also exchange those entries in \(\rho\)
- Distribution over
*graphs*is unchanged

- Measure differences in \(Z\)s between estimates in permutation-invariant ways
- e.g., min over permuting \(1:k\)
- or use mutual information

**Assortative mixing**in networks = nodes with same value of discrete characteristic have more links than you'd expect- How many is that?

\[ \begin{eqnarray} \kappa_{rs} & \equiv & e_{rs}/2m\\ \kappa_{r} & \equiv & \sum_{s}{\kappa_{rs}}\\ Q & \equiv & \sum_{r}{\kappa_{rr} - \kappa_r^2}\\ \end{eqnarray} \]

- Note: \(\Tr{\mathbf{\kappa}}\) maximized if
*all*nodes are in one block! - Assortativity usually refers to
*observed*characteristics

- We can use \(Q\) when \(z\) is something we make up:

\[ Q(z) = \sum_{r}{\kappa_{rr}(z) - \kappa_r(z)^2} \]

- This is the
**(Newman-Girvan) modularity**of the block-assignment vector \(z\)

- Equivalent (exercise!) to a sum over node pairs:

\[ Q = \frac{1}{2m}\sum_{i,j}{\left[A_{ij} - \frac{k_i k_j}{2m}\right]\delta_{Z_i Z_j}} \]

- Break this down:
- \(k_i k_j / 2m =\) probability of an \((i,j)\) edge if nodes are paired randomly but degrees are preserved
- \(A_{ij} - k_i k_j/2m > 0\) for \(A_{ij} = 1\), \(<0\) for \(A_{ij} = 0\)
- \(Q\) likes within-block edges, dislikes within-block non-edges
- Substitute other null models to taste
- Substitute divergences other than \(-\) to taste

**Community**or**module**: group of nodes with dense internal connections, but few connections to other communities**Community discovery**: given a graph, divide it into good communities- "Good" often means: high modularity \(Q\)
- Huge literature since Newman and Girvan 2003

- General maximization problem is NP
- Many, many heuristics
- Find highest betweenness
*edge*, remove, recalculate betweenness afterwards - Turn into an eigen-problem
- Assign random initial communities, take majority vote (like HW 1 Prob 3)
- Find most likely \(Z\) in an SBM
- Many of these built in to igraph

- Find highest betweenness

- Theoretical literature has focused on a very strong form of consistency: as \(n\rightarrow \infty\), \[
\Prob{\hat{Z} \neq Z} \rightarrow 0
\] i.e., probability that
*all*nodes are correctly assigned to communities goes to 1- Could instead imagine something like "
*proportion*of mis-assigned nodes goes to zero in probability" - Permuting over community labels always allowed

- Could instead imagine something like "
- Growing theoretical literature, typically assuming:
- Graph really comes from SBM
- Expected degree grows sufficiently rapidly with \(n\)
- \(\mathbf{b}\) is diagonally dominated
- Columns of \(\mathbf{b}\) are sufficiently different from each other

The classic approach, due to Hoff, Raftery and Handcock:

- Node \(i\) lives at a (latent) point \(Z_i \in \mathbb{R}^d\)
- HRH proposed these are IID \(\sim \mathcal{N}(0, \mathbf{I}_d)\)

- Edges become unlikely as nodes separate
- HRH proposed \(\logit{\Prob{ A_{ij}=1|Z_i, Z_j}} = \beta_0 - \| Z_i - Z_j\|\)

- All \(A_{ij}\) are
*dependent* - All \(A_{ij}\) are
*independent*given locations

- Why just \(\beta_0 - \| Z_i - Z_j \|\)? Why not \(\beta_0 - \beta_1 \| Z_i - Z_j\|\)?
- Why \(Z_i \sim \mathcal{N}(0, \mathbf{I}_d)\), instead of some other variance?
- If we multiply all the \(Z_i\) by the same scalar \(r\), and \(\beta_1\) by \(1/r\), nothing
*observable*changes - Thus fix \(\beta_1 = 1\), and prior variance at unity

- If we multiply all the \(Z_i\) by the same scalar \(r\), and \(\beta_1\) by \(1/r\), nothing
- The \(Z_i\) are
*still*not identified:- Nothing changes if rotate all the \(Z_i\) the same way
- Or if translate all the \(Z_i\) along the same vector
- Or if we reflect all the \(Z_i\) about the same plane
- Or combine rotations, translations and reflections

**Isometry**= translations which leave all distances (*metric*) the same (*iso-*)- For Euclidean space,
**isometry group**built from rotations, translations and reflections - The \(Z\)s are "identified up to isometry"

- For Euclidean space,
**Procrustes problem**= given two sets of points in \(\mathbb{R}^d\), find isometry which minimizes the distance between them- Good algorithms for this (especially if not too many points and \(d\) small)
- Often useful as an intermediate stage in working with continuous-space models

*Embedding*: given \(A_{ij}\), guess at \(Z\)*Inference*: on \(\beta_0\) and/or posterior distribution of \(Z\)- Of course, easy to simulate

- Add in node covariates
- Other distributions for locations
- Isometry: set mean at 0, variance at \(\mathbf{I}_d\) w.o.lo.g.
- Why think
*anything*is Gaussian?

- Other link-probability functions
- Why think
*anything*is logit-linear? - Zero outside maximum radius?
- Step-function ("Heaviside") link probabilities?

- Why think
- Motion over time (Moore and Sarkar)
- Other latent spaces
- Smooth manifolds
- Positively-curved (spherical) spaces
- Negatively-curved (hyperbolic) spaces