Markov Chain Monte Carlo (MCMC)
Monte Carlo
Monte Carlo methods are algorithms that solve problems through random sampling. For example, stochastic integration, where integrals that are difficult to solve analytically, can be approximated with simulation based methods.
The classical example of Monte Carlo Integration, is estimatign $\pi$ by uniformly sampling points in a unit grid, and then counting the number of points that lie in the unit circle, and the formula for the area of a circle to estimate $\pi$
Solving difficult integrals, is highly prevalent in Bayesian statistics. Where, computing quantities of interest require the computation of the posterior distribution, which then requires computing its normalising constant, an integral. Unless we have a conjugate prior-likelihood pair where the posterior distribution will be of a known form, this integral is often intractable.
Markov Chain Monte Carlo
The idea of MCMC is to use the mathematical properties of Markov Chains to generate samples from a distribution of interest.
Markov Chains
Reference: Markov Chains by J.R. Norris
Markov Chain Definition (Discrete Time - Finite State Space)
A stochastic process $(X_t)_{t\geq0}$ is a Markov Chain with intitial distribution $\mu$, transition matrix $P$, and support $S$ if:
- $X_0 \sim \mu$
- $\forall t \geq 0, \forall i_0, \dots, i_{t+1} \in S^{t+2}$
- $\forall t \geq 0, \forall i, j \in S^2$
The main remarks are that:
- For a given stochastic matrix P, if P is irreducible and aperiodic then a Markov(λ0, P ) process converges to the unique invariant distribution µ of P
- Now given a distribution of interest we want to find a stochastic matrix P such that its invariant distribution is $\mu$
- Find $P$ that satisfies the detailed balance equations
- Make sure $P$ is irreducible and aperiodic, then the marginal distribution of the Markov Chain will converge to $\mu$.
Markov Chain Definitions
Irreducibility
A Markov Chain is irreducible if $\forall (i, j) \in S^2$, $\exists t \geq 0$ such that $P^t_{ij} > 0$.
That is, it is possible to get from any state to any other state in a finite number of steps.
Aperiodicity
A state $i \in S$ of a Markov Chain is aperiodic if $\exists N \in \mathbb{N}$ such that $\forall t \geq N$ $p_{ii}^{(t)} > 0$.
This is equivalent to saying that
In words, this means that for sufficiently large $t$, $\forall i \in S$ there is a positive probability of returning to $i$ at irregular times.
Invariant Distribution
A distribution $\mu$ on $S$ is invariant for a Markov Chain with transition matrix $P$ if $\mu^T P = \mu^T$.
Starting from an invariant distribution
Now if $\mu$ is invariant for $P$, using that, for a $\text{Markov}(\lambda_0, P)$ process \(\begin{align*} \lambda_t = \lambda_0 P^t \end{align*}\) Then the MC $(X_t)_{t\geq0}$ with transition matrix $P$ and initial distribution $\mu$ will have $X_t \sim \mu$ for all $t \geq 0$.
Markov Chain Theorems
We have the following theorem, recall that we are in the finite state space setting.
Theorem $\quad$ Limit Distribution is Invariant.
For some $i \in S$ if: \(\begin{align*} \lim_{t \to \infty} p^{(t)}_{ij} = \pi_j \quad \forall j \in S \end{align*}\)
Then $\pi = (\pi_j: j \in S)$ is invariant for $P$.
The theorem below is a distilled version of multiple theorems in Norris.
Theorem $\quad$ Convergence of Markov Chains.
Let $(X_t)_{t\geq0}$ be $\text{Markov}(\lambda_0, P)$ where $P$ is irreducible and aperiodic. And $\mu$ is an invariant distribution for $P$.
Then $\exists \rho \in (0, 1), \; c \in (0, \infty)$ s.t. $\forall i \in S^2, \forall t \geq 0$
\[\begin{align*} |p_{ij}^{(t)} - \mu_j| \leq c \rho^t \quad \text{and} \quad |\mathbb{P}(X_t = j) - \mu_j| \leq c \rho^t \end{align*}\]The above theorem tells us that if a stochastic matrix is irreducible and aperiodic, then the Markov Chain will converge to its invariant distribution.