\documentclass{article}
\usepackage[urw-garamond]{mathdesign}
\usepackage{amsmath, amsfonts}
\usepackage{natbib}
\usepackage{hyperref}
\begin{document}
\newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]}
\newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]}
\newcommand{\Binom}{\mathrm{Binom}}
<>=
# General set-up options
library(knitr)
opts_chunk$set(size="small", background="white",
cache=TRUE, autodep=TRUE,
tidy=FALSE, warning=FALSE, message=FALSE,
echo=TRUE)
@
\title{Lecture 21: Compartment Models}
\author{36-467/36-667}
\date{15 November 2018}
\maketitle
\begin{abstract}
``Compartment models'', which track the transitions of large numbers of
individuals across a series of qualitatively-distinct states (the
``compartments'') are common in fields which study large populations, such as
epidemiology and sociology. These notes cover such models as a particular
kind of Markov chain, including some examples from epidemiology,
evolutionary economics, and demography.
I made some mistakes in lecture; these notes correct them.
\end{abstract}
Epidemiology, ecology, sociology, demography, chemical kinetics, and some other
fields are sometimes grouped together as ``population sciences'', which study
the behavior and evolution of large populations of individuals. All of these
fields make extensive use of what are called (among other things) ``compartment
models'', where the members of the population are divided into a discrete set
of qualitatively-distinct states (or ``compartments''), and transition between
them at rates which vary as functions of the current distribution over
compartments\footnote{Whether the ubiquity of these models reflects some
underlying commonality across these fields, or just a tendency for scientists
with similar mathematical training to borrow and/or re-invent similar models,
is beyond the scope of this note.}. They can be used whenever a population
is divided into discrete, qualitatively-distinct types or classes, and
transitions between classes are stochastic.
\section{Compartment Models in General}
\subsection{Basic Definitions and Assumptions}
Let's begin with the case of a ``closed'' population of constant size $n$. (\S
\ref{sec:demography} will consider the modifications needed to make the
population size vary over time.) This population is divided into $r \geq 2$
qualitatively distinct classes, types or compartments\footnote{These words are
all synonyms in the present context.}, and we are primarily
interested in how many individuals are located in each compartment.
We thus define the state of the population at time $t$, $X(t)$,
as a vector,
\begin{equation}
X(t) \equiv [X_1(t), X_2(t), \ldots X_r(t)]
\end{equation}
where $X_i(t)$ counts how many individuals are in compartment $i$ at time $t$.
The state $X(t)$ changes because individuals can move from one compartment to
another. We assume that these moves or transitions follow
a stochastic process, and we make two important assumptions
about that process.
\begin{enumerate}
\item The $i$-to-$j$ {\bf transition rate} $p_{ij}$ is the probability that an
individual in compartment $i$ will move to compartment $j$ at the next time
step. ($p_{ii}$ is thus the probability of staying in compartment $i$.) We
assume that the transition rates are functions of the {\em current}
population state, and not the past (or the future), so $p_{ij}(X)$ is a
well-defined function.
Note that this assumption allows some $p_{ij}$ to be zero, either always or
for certain states of the population. (These are sometimes called {\bf
forbidden} or {\bf suppressed} transitions.) It also allows some $p_{ij}$
to be constant, regardless of $X$.
\item All individuals in the population make {\em independent} transitions,
conditional on $X(t)$. The moves of all the individuals in a given
compartment are thus {\em conditionally} independent and identically
distributed.
\end{enumerate}
These assumptions are {\em not} implied by the mere fact that we've divided the
population into compartments and are tracking the number of individuals in each
compartment. They could be false, or true only for a different choice of
compartments, etc.
--- You may feel some scruples about applying these assumptions to people, at
least to social processes which reflect human actions (as opposed to, say,
models of disease spread). But modeling aggregated social processes as
stochastic in this way does not commit us to saying that people act randomly,
or don't have motives for their actions, or that all individuals in the same
compartment are identical. What it does commit us to is that the differences
between people in the same compartment which determine their motives and their
actions are so various, so complicated, and so independent of the circumstances
which landed them in that compartment that they can't be predicted, and can be
effectively treated as random\footnote{There are precise technical senses in
which any sufficiently complicated function is indistinguishable from
randomness. For a brilliant and accessible introduction to these ideas, and
how they underlie probability modeling, I strongly recommend
\citet{Ruelle-chance}. The standard textbook on this subject is
\citet{Li-and-Vitanyi-1997}. \citet[ch.\ 3]{Salmon-1984} applies these ideas
to the problem of defining ``the'' probability of an outcome for members of
some class.}.
\subsection{Consequences of the Assumptions}
These assumptions are enough to tell us that the sequence of random variables
$X(t)$ is a Markov chain. If it's a Markov process, it has to be a Markov {\em
chain} because the state space is discrete, though large (Exercise
\ref{exercise:size-of-state-space}), so what we really need to do is show that
the Markov property holds. The easiest way to get there is through defining
some additional variables.
\subsubsection{Flow or Flux Variables}
For each pair of compartments $i$ and $j$, define $Y_{ij}(t+1)$ as
the number of individuals who moved from compartment $i$ to compartment $j$ at
time $t+1$. This is sometimes called the {\bf flow} or {\bf flux} from $i$ to
$j$.
$Y_{ij}(t+1)$ is a random quantity. By the assumptions above, in fact, it
has a binomial distribution:
\begin{equation}
Y_{ij}(t+1) \sim \Binom(X_i(t), p_{ij}(X(t)))
\end{equation}
This implies that
\begin{eqnarray}
\Expect{Y_{ij}(t+1)|X(t)} & = & X_i(t) p_{ij}(X(t))\\
\Var{Y_{ij}(t+1)|X(t)} & = & X_i(t) p_{ij}(X(t))(1-p_{ij}(X(t)))
\end{eqnarray}
There are two implications of these worth noting.
\begin{enumerate}
\item Both the expected flux and the variance in the flux are proportional
to $X_i(t)$. This means that the standard deviation in the flux
is only proportional to $\sqrt{X_i}$. Thus, if $X_i(t)$ is large,
the {\em relative} deviations in the flux are shrinking. This suggests
that there should be nearly-deterministic behavior in large populations.
\item The assumption that each individual in compartment $i$ has the {\em
same} probability of moving to compartment $j$ is crucial to calculating
the variance. If there were randomly-distributed transition rates for all
individuals in the compartment, we'd see a higher variance for the flux
(Exercise \ref{exercise:heterogeneous-transition-rates}).
\end{enumerate}
If $r > 2$, then the set of fluxes out of $i$, $[Y_{i1}(t+1), Y_{i2}(t+1),
\ldots Y_{ir}(t+1)]$ has as its joint distribution a multinomial, with the
number of ``trials'' being $X_i(t)$, and the probabilities being $(p_{i1},
p_{i2}, \ldots p_{ir})$. (Remember that the marginal distributions of a
multinomial distribution are binomial.)
\subsubsection{Stock-Flow Consistency}
The variables $X_i(t)$ count the total number of individuals in each
compartment, regardless of how or when they arrived there. Such variables are
sometimes called {\bf stock} or {\bf level} variables. The $Y_{ij}(t)$
variables, on the other hand, are (as already said) {\bf flow} or {\bf flux}
variables. A basic principle is that the new level of the stock
is the old level, plus flows in, minus flows out:
\begin{eqnarray}
\label{eqn:stock-flow} X_i(t+1) & = & X_i(t) + \sum_{j\neq i}{Y_{ij}(t+1) - Y_{ji}(t+1)}\\
\text{(new level)} & = & \text{(old level)} + \sum{\text{(in flow)} - \text{(out flow)}}
\end{eqnarray}
This is sometimes called {\bf stock-flow consistency}.
\subsubsection{Markov Property}
Because Eq.\ \ref{eqn:stock-flow} says how the level in each compartment
changes, it tells us how the state vector $X(t)$ changes as well.
Notice that, from our assumptions, $Y_{ij}(t+1)$ is independent of the past
given $X(t)$. It is independent of earlier values of $X$, and also independent
of earlier values of the $Y$'s. Since Eq.\ \ref{eqn:stock-flow} tells us that
$X(t+1)$ is a deterministic function of $X(t)$ and $Y(t+1)$, it follows that
$X(t+1)$ is independent of earlier $X$'s (and $Y$'s) given $X(t)$. Hence the
state of the population follows a Markov chain.
Because the state space, while large, is finite, the Markov chain will
eventually enter some closed, irreducible set, and stay there forever. In the
long run, the distribution of $X(t)$ will follow the invariant distribution for
that irreducible set. These are properties of any finite Markov chain, and so
apply to this one, even though its has special structure.
\section{Epidemic Models}
In epidemiology, compartment models are often known as ``epidemic models'', and
in that field they descend from pioneering work in mathematical epidemiology
done in the early 20th century\footnote{It is conventional these days to
attribute this to the work of Sir Ronald Ross on malaria, but early reviews
(e.g., \citealt[ch. VIII, pp. 79--82]{Lotka}) make it clear that this was
much more of a collective effort.}, but they are applied much more broadly
than just to epidemics. It's worth thinking through some of the most
basic cases, to get a feel for how they work.
\subsection{SI}
\label{sec:SI}
Let's begin with a model with just two compartments, dividing the entire
population into those who are infected with a certain disease --- call this
compartment $I$ --- and those who, not having caught the disease yet, are
susceptible, in compartment $S$. The state of the population is $X(t) =
(X_I(t), X_S(t))$, but since $X_S(t) = n - X_I(t)$, it is enough to keep track
of $X_I(t)$.
With two compartments, we need to specify two rates, $p_{SI}$ and $p_{IS}$. The former is the rate at which susceptible individuals become infected; the later is the rate at which infected individuals are cured and become susceptible again.
A simple assumption is that $p_{SI}(X_S, X_I) = \alpha X_I$, because the more
members of the population are infected, the more likely a susceptible
individual is to encounter one and become infected. We might, however, simply
set $p_{IS} = 0$, indicating that no one recovers from the disease, but rather
the infected remain infected, and contagious, forever. There are a couple of
ways to justify this latter assumption.
\begin{enumerate}
\item We're in a zombie movie. (This {\em is} Pittsburgh, after all.)
\item We're only interested in what happens on a comparatively short time
scale. More precisely, whatever processes make infected individuals
non-infectious (recovery, death, quarantine, etc.) happen much more slowly
than the process of contagion itself, and we're only trying to approximate
the beginning of the epidemic, before the former processes can kick in.
\item The infected really do remain around, and contagious, forever (\S
\ref{sec:diffusion-of-innovations}.)
\end{enumerate}
Leaving aside how to justify the assumption that $p_{SI} = 0$, what are the
consequences of these assumptions about the transition rates?
An obvious consequence is that if $X_I(t) = n$ or $X_I(t) = 0$, then $X_I$ will
remain $=n$ or $=0$ forever. That is, these are {\bf absorbing} states of the
population. You can convince yourself that these are the only absorbing
states. You can also convince yourself that an absorbing state forms a closed,
irreducible set of states, and that these are the only closed, irreducible sets
for this chain (Exercise \ref{exercise:absorbing-states}). There are thus two
possible invariant distributions, one which gives probability 1 to $X_I=0$ and
the other which gives probability 1 to $X_I = n$. The disease thus either dies
out, or takes over the whole population. Finally, you can convince yourself
that $X_I(t)$ never decreases, so the disease must take over the population if
it's ever present at all.
To see what happens on the way there, we'll need to use the flow
variables. There is only one flux that we need to worry about, $Y_{IS}$:
\begin{eqnarray}
X_I(t+1) = X_I(t) + Y_{IS}(t+1)\\
X_S(t+1) = X_S(t) - Y_{IS}(t+1)
\end{eqnarray}
Since $Y_{IS}$ is binomially distributed, we can work out the expected
change in the number of infected individuals.
\begin{eqnarray}
\Expect{X_I(t+1)|X_I(t)} - X_I(t) & = & X_S(t)\alpha X_I(t)\\
\label{eqn:logistic-growth} & = & \alpha X_I(t)(n-X_I(t))
\end{eqnarray}
Notice that this growth is very small when $X_I \approx 0$, and again when
$X_I \approx n$. There is an intermediate range where the growth
is very rapid.
Figure \ref{fig:logistic-growth} plots a typical expected course of $X_I(t)$.
This is what's known as {\bf logistic} or {\bf sigmoidal}\footnote{Because
somebody thought it that curve looked like an $S$, or perhaps the outline of
$\sigma$.} growth. This can be caricatured as ``nothing seems to happen for
a long time, then suddenly it explodes and grows exponentially, but eventually
it levels off and saturates''.
\begin{figure}
<>=
alpha <- 1e-6
n <- 1e5
xi <- vector(length=200)
xi[1] <- 1
for (t in 2:length(xi)) {
xi[t] <- xi[t-1] + alpha*xi[t-1]*(n-xi[t-1])
}
plot(1:length(xi), xi, xlab="t", ylab=expression(X[I](t)), type="l")
@
\caption{Illustration of the kind of behavior that results from Eq.\
\ref{eqn:logistic-growth}. This is shown with $n=10^5$, $\alpha=10^{-6}$,
and $X_I(1) = 1$.}
\label{fig:logistic-growth}
\end{figure}
Of course, Figure \ref{fig:logistic-growth} is comes from iterating the {\em
deterministic} equation \ref{eqn:logistic-growth}. What happens if we
actually follow the stochastic process? Figure \ref{fig:SI-trajectories} plots
100 realizations of the SI model, along with the deterministic approximation,
and shows that while the {\em shape} is the same, the timing of the exponential
take-off and of the eventual saturation becomes variable.
\begin{figure}
<>=
sim.SI <- function(n, alpha, xi.1=1, T) {
xi <- vector(length=T)
xi[1] <- xi.1
for (t in 2:length(xi)) {
xi[t] <- xi[t-1] + rbinom(n=1, size=n-xi[t-1], prob=alpha*xi[t-1])
}
return(xi)
}
SI.sims <- replicate(100, sim.SI(1e5, 1e-6, 1, 200))
plot(1:length(xi), xi, xlab="t", ylab=expression(X[I](t)), type="l")
for (sim in 1:ncol(SI.sims)) {
lines(1:nrow(SI.sims), SI.sims[,sim], col="grey")
}
lines(1:length(xi), xi)
legend("topleft", legend=c("simulations", "deterministic approximation"),
col=c("grey", "black"), lty="solid")
@
\caption{Comparison of stochastic simulations of the SI model (grey lines) to
the deterministic approximation (black line). Parameters are as
in Figure \ref{fig:logistic-growth}.}
\label{fig:SI-trajectories}
\end{figure}
\subsubsection{Diffusion of Innovations}
\label{sec:diffusion-of-innovations}
A disease which, once caught, never goes away and remains infectious forever is
a bit weird, biologically speaking. But the same ideas are also used to model
the spread of new technologies and ideas, what is sometimes called the
``diffusion of innovations'', and then it makes sense to say that those who
have ``caught'' an idea remain infected with it, and potentially contagious,
for the rest of their lives.
This week's homework explores this idea in the context of one of the classic
data-sets on the diffusion of innovations, about the spread of the use of a new
antibiotic among doctors in four towns in Illinois in the 1950s. The same
notions, however, have been used to model the adoption of a wide range of
technologies, practices, art forms, and ideologies
\citep{Rogers-diffusion-of-innovations}.
It's worth noting that the idea of analogizing the spread of ideas or behavior
to the spread of disease is a very old one, but it's usually been an analogy
deployed when people {\em disapprove} of what is spreading. (It is very rare
to talk about ``infectious virtue''.) The oldest example I have found is the
Roman writer and politician Pliny the Younger calling Christianity a
``contagious superstition'' ({\em superstitionis contagio}) in a letter to the
Emperor Trajan in +110 ({\em Epistles} X 96.9), but the analogy has
spontaneously recurred many, many times since then. It was also the basis for
more systematic and scholarly studies, such as
\citet{Siegfried-germs-and-ideas}, long before \citet{Dawkins-Selfish} coined
the word ``meme''.
\subsection{SIS}
A logical next step is to keep two compartments, but to allow for movement back
and forth between them. That is, some infected individuals can recover and
become susceptible again. Again, a simple assumption is that infected
individuals recover at a constant rate, $p_{SI} = \rho$, while we keep $p_{IS}
= \alpha X_I$. Because this allows for movement from each compartment
to the other, we need to keep track of {\em two} fluxes:
\begin{eqnarray}
Y_{SI}(t+1) & \sim & \Binom(X_S(t), \alpha X_I(t))\\
Y_{IS}(t+1) & \sim & \Binom(X_I(t), \rho)
\end{eqnarray}
Now
\begin{eqnarray}
X_I(t+1) & = & X_I(t) + Y_{SI}(t+1) - Y_{IS}(t+1)\\
X_S(t+1) & = & X_S(t) - Y_{SI}(t+1) + Y_{IS}(t+1)
\end{eqnarray}
To understand the dynamics here, start by considering what would need
to happen for the expected change in $X_I$ to be zero.
\begin{eqnarray}
\Expect{X_I(t+1)|X_I(t)} - X_I(t) = \alpha X_I(t) (n-X_I(t)) - \rho X_I(t)
\end{eqnarray}
Let's set this equal to 0 at $X_I(t) = X_I^*$. There are two solutions:
\begin{eqnarray}
X_I^*(\alpha (n - X_I^*) - \rho) & = &0\\
X_I^* & = & 0 ~ \text{or}\\
\frac{\rho}{\alpha} + X_I^* & = & n\\
X_I^* & = & n - \frac{\rho}{\alpha}
\end{eqnarray}
The first solution is boring --- if nobody's infected, nobody gets infected.
The other solution is more interesting: it gives the level of infection
where new infections just balance recoveries (on average). Whether
$X_I(t)$ will tend to converge to this level is however another question.
Obviously this is only possible if $n > \rho/\alpha+1$, so let's assume
that going forward.
Suppose that $X_I(t) = X_I^* + \epsilon$ for some small amount $\epsilon$.
You can show (Exercise \ref{exercise:restoring-force}) that the expected
change in $X_I$ has the opposite sign from $\epsilon$, so that if the
process is above $X_I^*$ it will tend to be pulled down towards it,
and vice versa. This suggests that the process should bounce around
this level.
\begin{figure}
<>=
sim.SIS <- function(n, alpha, rho, xi.1=1, T) {
xi <- vector(length=T)
xi[1] <- xi.1
for (t in 2:length(xi)) {
y.si <- rbinom(n=1, size=n-xi[t-1], prob=alpha*xi[t-1])
y.is <- rbinom(n=1, size=xi[t-1], prob=rho)
xi[t] <- xi[t-1] + y.si - y.is
}
return(xi)
}
SIS.sims <- replicate(100, sim.SIS(1e5, 1e-6, 1e-2, 1, 400))
plot(0, xlab="t", ylab=expression(X[I](t)), type="n",
ylim=c(0,1e5), xlim=c(1, nrow(SIS.sims)))
for (sim in 1:ncol(SI.sims)) {
lines(1:nrow(SIS.sims), SIS.sims[,sim], col="grey")
}
@
\caption{100 trajectories of the SIS model, with $n=10^5$, $\alpha = 10^{-6}$,
$\gamma = 10^{-2}$ (so $X_I^* = 9\times 10^{4}$) and $X_I(1) = 1$. Notice
that there are {\em two} horizontal lines at large $t$.}
\label{fig:sis-synopsis}
\end{figure}
\begin{figure}
<>=
plot(0, xlab="t", ylab=expression(X[I](t)), type="n",
ylim=c(8.5e4,9.5e4), xlim=c(200, nrow(SIS.sims)))
for (sim in 1:ncol(SI.sims)) {
lines(1:nrow(SIS.sims), SIS.sims[,sim], col="grey")
}
@
\caption{Close-up of Figure \ref{fig:sis-synopsis}, showing fluctuations
around $X_I^*$ at large times.}
\end{figure}
\begin{figure}
<>=
plot(0, xlab="t", ylab=expression(X[I](t)), type="n",
ylim=c(0,10), xlim=c(1, 200))
for (sim in 1:ncol(SI.sims)) {
lines(1:nrow(SIS.sims), SIS.sims[,sim], col="grey")
}
@
\caption{Close-up of Figure \ref{fig:sis-synopsis}, showing that some
simulations converge to zero.}
\end{figure}
In fact, the situation is a little more subtle. $X_I = 0$ is still an
absorbing state, but now it is one which {\em can} be reached from other states
of the population. If, after all, $X_I(t) = 1$, there is some probability that
$X_I(t+1) = 0$. (What is that probability?) And no matter how big $X_I$ gets,
there is always some probability that it will eventually reach 1, and then in
turn 0. So $X_I=0$ is the only closed, irreducible set of states.
Consequently, {\em the disease will always die out} eventually in an SIS model.
It will, however, generally take a very long time to do so, and in the
meanwhile it will spend most of its time oscillating around $X_I^*$.
\subsection{SIR, SEIR, etc.}
It is easy, and often convenient, to add more compartments to epidemic
models of this sort. In an SIR model, for example, the third compartment,
$R$, stands for ``removed'' or ``recovered'', depending on whether we
think of individuals in that compartment as having died off, or as having
acquired immunity. The basic dynamics are
\begin{eqnarray}
p_{SI} & = & \alpha X_I\\
p_{IR} & = & \rho\\
P_{SR} & = & 0\\
p_{IS} & = & 0\\
P_{RI} & = & 0\\
p_{RS} & = & 0
\end{eqnarray}
I will let you check that the only absorbing state is $X_R = n$, and also let
you work out both the deterministic approximation and simulate some
trajectories on the way there.
A related wrinkle is to add an intermediate compartment between $S$ and $I$,
say $E$ for ``exposed'', in which someone is not yet showing symptoms.
Individuals in compartment $E$ might or might not be infectious themselves
(and, if so, might be more or less infectious than those in compartment $I$).
If one wants to have characteristic lengths of time for each phase of the
disease, one can add even more compartments, say $I_1, I_2, \ldots I_{\tau}$,
with automatic progression from one to the next.
\section{Space}
The easiest way to handle space in a compartment model is to say that you have
$r_1$ types or classes of individuals across $r_2$ locations, so that the
$r=r_1 r_2$ compartments track combinations of types and locations. Migration
from one location to another is then just another kind of transition. The
transition rates can be made to be functions of the counts of types only in the
same location (e.g., $S$'s in location 1 can only be infected by $I$'s in
location 1), or global aggregates calculated across all location, or some
combination of both of these.
This approach will work poorly when the number of locations $r_2$ will become
comparable to the population size $n$, since then most locations will have a
count of 0 for most types, and you would be better off using a different, more
individual-based model.
\section{Asymptotics}
We haven't discussed what happens as $n\rightarrow\infty$. Basically,
a compartment model converges on a system of ordinary differential
equations.
Said a bit less baldly, suppose that the transition rates $p_{ij}(X)$ are all
of the form $p_{ij}(X) = \rho_{ij}(X/n) h$, where $h$ represents the amount of
time that passes from one value of $X$ to the next. Notice that we are assume
that the transition rates are really functions of the {\em distribution} of
individuals across the compartments, and not the {\em numbers} in compartments
--- that we get the same rate of infection when $n=20$ and $X_I=10$ as when
$n=2000$ and $X_I=1000$. Finally, we fix a finite, but perhaps large,
interval of time $T$. Then what we find is that the trajectory of
the compartment model between time $0$ and time $T$ approaches the solution
of a differential equation. If we say that $x(t)$ is the solution to
\begin{equation}
\frac{dx_i}{dt} = \sum_{j}{\rho_{ji}(x(t)) x_j(t) - \rho_{ij}(x(t)) x_i(t)}
\end{equation}
then, as $n\rightarrow\infty$ and $h\rightarrow 0$, the random sequence
$X(h)/n, X(2h)/n, X(3h)/n, \ldots X(T)/n$ comes closer and closer to the
deterministic function $x$ on the interval $[0,T]$ (with high probability).
This is the trick that I used above, when I invoked ``deterministic
approximations'' to the epidemic models. For instance, in the SI model, the
limiting trajectory for $(X_S(t)/n, X_I(t)/n)$ is given by
\begin{eqnarray}
\frac{dx_S}{dt} & = & - ax_S(t)(1-x_S(t))\\
\frac{dx_I}{dt} & = & ax_I(t)(1-x_I(t))\\
x_S(t) + x_I(t) & = & 1\\
\end{eqnarray}
and solving this differential equation does indeed give us sigmoidal
growth curves. (Can you work out how the $a$ here relates to the
$\alpha$ in the stochastic, finite-$n$ version of the SI model?)
It is important that the convergence only holds over finite times $[0,T]$.
Basically, this is because infinite time gives a stochastic process
infinitely many opportunities to do arbitrarily improbable things, so
they will eventually. In particular, in the SIS model, the
limiting behavior is
\begin{eqnarray}
\frac{dx_I}{dt} & = & ax_I(t) (1-x_I(t)) - g x_I(t)
\end{eqnarray}
and not extinction.
\section{Demography: Birth, Aging, and Death}
\label{sec:demography}
So far, we have considered models where the population size $n$ is constant.
Most real populations change in size, and an important application of
compartment models is to demography, where what we really care about is how $n$
changes over time. The three processes which change $n$ are birth, death, and
migration.
The SIR model shows how to handle death: we can treat ``being dead'' as just
another compartment, say $R$\footnote{Saying that $R$ stands for ``removed'' is
euphemistic.}. Individuals in compartment $i$ transition to this compartment
at some rate $\delta_i$, and there are no transitions out of this compartment.
Demographers typically define the {\bf mortality rate} as the number of deaths
per 1000 individuals per year, which you can translate into a probability per
person per unit time (depending on the time-scale of your model). $Y_{iR}(t)$
would be the random variable which counted the actual number of deaths among
members of compartment $i$ in time-period $t$. If this sort of formal
simplicity isn't needed, however, we can simply introduce a new variable $D_i(t)$ for deaths in compartment $i$ in period $t$.
Migration can be handled similarly, by introducing a new compartment for
``leaving for the rest of the world''. If both in- and out- migration are
possible, then we need to model the flow of immigrants.
Birth, however, requires a slightly special treatment in these models. The
difficulty is that parents are (typically) still around. The easiest way to
handle this is to introduce a new set of variables, $B_{ij}(t)$, which count
the number of new births into compartment $j$ from parents in compartment
$i$\footnote{I am writing as though individuals have only one parent, which of
course isn't true for (most) vertebrates, including human beings, but the
reasons for this will become apparent shortly.} If individuals can only be
born one at a time, then we can make $B_{ij}$ binomial reflecting a certain
rate of parentage for individuals in compartment $i$. If there
are a non-trivial number of twins, triplets, etc., it might be better to
use something like a Poisson distribution.
The model thus looks like this:
\begin{eqnarray}
Y_{ij}(t+1) & \sim & \Binom(X_i(t), p_{ij}(X(t)))\\
D_i(t+1) & \sim & \Binom(X_i(t), \delta_i(X(t)))\\
B_{ij}(t+1) & \sim & \Binom(X_i(t), \beta_{ij}(X(t)))\\
X_i(t+1) & = & X_i(t) - D_i(t+1) + B_{ii}(t+1) + \sum_{j\neq i}{B_{ji}(t+1) + Y_{ji}(t+1) - Y_{ij}(t+1)}
\end{eqnarray}
In human demography, and to a lesser extent in population ecology, we need a
fairly large number of compartments, which track combinations of age and sex.
Demographers conventionally work with 5-year age brackets, so that there is a
compartment for females age 0--5 and for males age 0--5, for females age 6--10
and for males age 6--10, and so on all down the line. The transition
probabilities are very simple: every individual who does not die moves into the
next age group (and the same sex), with probability 1. It is the age-specific
death rates $\delta_i$ which let us model realistic distributions of life-span;
without them, we'd have geometric distributions of longevity.
Births can only happen into the 0--5 age brackets, so $B_{ij}=0$ unless $j$ is
one of those two brackets. Because only females give birth, $B_{ij}=0$ unless
$i$ is a female age bracket of child-bearing age. In many applications, the
age-fertility rates $\beta_{ij}$ will be taken to be constant, but they can be
allowed to vary with the number of males, or even with the ratio of males to
females, depending on the customs of the population concerned. Note that these
rates $\beta_{ij}$ should not be confused with the {\bf total fertility rate},
which is the {\em sum} of age-specific rates across child-bearing years.
\section{Chemistry}
Chemistry provides an important application for compartment models. Here the
``compartments'' are different types of molecules, or {\bf chemical species},
and transitions are the consequences of reactions. We can write the SI model
in this form, with one reaction, $S+I \rightarrow 2I$\footnote{A reaction like
this, where the species $I$ is involved in a process which makes more of the
same species, is called {\bf autocatalytic}.}. We can also write the SIS
model as something with two reactions, $S+I\rightarrow 2I$ and $I \rightarrow
S$. The constant-$n$ case corresponds to all reactions having the same number
of molecules on both the left- and the right- hand sides (as the SI and SIS
cases do). In actual chemistry, though, it's common for the number of
molecules to change, as in $2 H_2 + O_2 \rightarrow 2 H_2O$ (three molecules go
in, two molecules come out). The way to handle this is to use $Y$ variables
to count the number of times each reaction happens, and then subtract the
substrates (on the left) and add the products (on the right), so, e.g.,
if the hydrogen-and-oxygen-to-water reaction was the only one, we'd have
\begin{eqnarray}
X_{H_2}(t+1) & = & X_{H_2}(t) - 2 Y_{2 H_2 + O_2 \rightarrow 2 H_2O}(t+1)\\
X_{O_2}(t+1) & = & X_{O_2}(t) - Y_{2 H_2 + O_2 \rightarrow 2 H_2O}(t+1)\\
X_{H_2 O}(t+1) & = & X_{H_2 O}(t) + 2 Y_{2 H_2 + O_2 \rightarrow 2 H_2O}(t+1)
\end{eqnarray}
(We would also typically assume that the transition rate for
this reaction is $\propto X_{H_2}^2 X_{O_2}$.)
\section{Further reading}
For a fine textbook treatment of compartment models in biology, see
\citet{Ellner-Guckenheimer}, which includes a good chapter on epidemic models,
and references to the further literature.
% Memetics, epidemics, ecological models, social learning, Keizer on chemistry...
\section{Exercises}
To think through, as complements to the reading, not to hand in.
\begin{enumerate}
\item \label{exercise:size-of-state-space} {\em Compartment-model state spaces are very large} Fix $n$. A possible
state $X$ is a vector of non-negative integers $[X_1, X_2, \ldots X_r]$, with
the constraint that $\sum_{i=1}^{r}{X_i} = n$. Prove that the number of such
vectors is $\left( \begin{array}{c} n+r-1\\ n\end{array}\right)$.
\begin{enumerate}
\item Imagine writing out a line of $n$ dots, and inserting $r-1$ vertical
bars between the dots, with the possibility of bars being next to each
other. Call the number of dots to the left of the left-most bar $X_1$, the
number of dots between that first bar and the second $X_2$, and so on.
Convince yourself that every arrangement of dots and bars corresponds to a
valid value of $X$, and that every valid value of $X$ can be represented in
this way.
\item Show that the number of arrangements of dots and bars is equal to
choosing $n$ locations for dots out of $n+r-1$ places.
\item Show that $\left( \begin{array}{c} n+r-1\\ n\end{array}\right) =
\left( \begin{array}{c} n+r-1\\ r-1\end{array}\right)$.
\item Use Stirling's approximation, $\log{n!} \approx n\log{n}$, to
show that as $n$ grows with $r$ fixed, the size of the state space
grows like $n^{r-1}$.
\end{enumerate}
\item \label{exercise:heterogeneous-transition-rates} {\em Randomly-varying
transition rates across individuals and over-dispersion} Suppose that each
individual in compartment $i$ has its own, random transition rate $P_{ij}$
for moving to compartment $j$, with $\Expect{P_{ij}} = p_{ij}$ and
$\Var{P_{ij}} = v_{ij}$. Show that $\Var{Y_{ij}(t+1)|X(t)} =
X_i(p_{ij}(1-p_{ij}) + v_{ij})$.
\item \label{exercise:absorbing-states} Show the following for the SI model
of \S \ref{sec:SI}.
\begin{enumerate}
\item $X_I = 0$ and $X_I = n$ are absorbing states.
\item There are no other absorbing states.
\item An absorbing state forms a closed, irreducible set of states. (This
is a general fact about Markov chains.)
\item This chain has no other closed, irreducible sets.
\item With probability 1, either $X_I(t) \rightarrow 0$ or $X_I(t)
\rightarrow n$.
\end{enumerate}
\item \label{exercise:restoring-force} For the SIS model, calculate
$\Expect{X_I(t+1) - X_I(t)|X_I(t) = X_I^* + \epsilon}$ in terms of
$n$, $\alpha$, $\rho$ and $\epsilon$, and show that (for small $\epsilon$)
it has the opposite sign from $\epsilon$.
\item \label{exercise:constant-rates-and-exponentials}
\begin{enumerate}
\item Prove that if the probability of an event happening on each time
step is a constant $p$, and successive time-steps are independent,
the number of steps we need to wait for the event to happen
has a geometric distribution. Find the parameter in terms of $p$.
\end{enumerate}
\end{enumerate}
\bibliography{locusts}
\bibliographystyle{crs}
\end{document}