Monte Carlo Methods

$\newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{{\equiv\atop\mathrm{D}}}}}$ 10.1 : Motivation
10.1.1 : What is the attraction?
10.2 : Parallel Random Number Generation
10.2.1 : Sequential random number generators
10.2.2 : Parallel random number generation
10.2.2.1 : Master-worker generator
10.2.2.2 : Sequence splitting solutions
10.2.2.3 : Blocked random number generators
10.2.2.4 : Mersenne twister
10.2.3 : Examples
10.2.4 : Monte Carlo simulation of the Ising model

10 Monte Carlo Methods

Monte Carlo simulation is a broad term for methods that use random numbers and statistical sampling to solve problems, rather than exact modeling. From the nature of this sampling, the result will have some uncertainty, but the statistical law of large numbers' will ensure that the uncertainty goes down as the number of samples grows.

10.1 Motivation

Top > Motivation

Let's start with a simple example: measuring an area, for instance, $\pi$ is the area of a circle inscribed in a square with sides 2. If you picked a random point in the square, the chance of it falling in the circle is $\pi/4$, so you could estimate this ratio by taking many random points $(x,y)$ and seeing in what proportion their length $\sqrt{x^2+y^2}$ is less than 1.

You could even do this as a physical experiment: suppose you have a pond of an irregular shape in your backyard, and that the yard itself is rectangular with known dimensions. If you would now throw pebbles into your yard so that they are equally likely to land at any given spot, then the ratio of pebbles falling in the pond to those falling outside equals the ratio of the areas.

Less fanciful and more mathematically, we need to formalize the idea of falling inside or outside the shape you are measuring. Therefore, let $\Omega\in[0,1]^2$ be the shape, and let a function $f(\bar x)$ describe the boundary of $\Omega$, that is \begin{equation} \begin{cases} f(\bar x)<0&x\not\in\Omega\\ f(\bar x)>0&x\in\Omega\\ \end{cases} \end{equation} Now take random points $\bar x_0,\bar x_1,\bar x_2\in[0,1]^2$, then we can estimate the area of $\Omega$ by counting how often $f(\bar x_i)$ is positive or negative.

We can extend this idea to integration. The average value of a function on an interval $(a,b)$ is defined as \begin{equation} \langle f\rangle = \frac1{b-a}\int_a^bf(x)dx \end{equation} On the other hand, we can also estimate the average as \begin{equation} \langle f\rangle \approx \frac 1N\sum_{i=1}^nf(x_i) \end{equation} if the points $x_i$ are reasonably distributed and the function $f$ is not too wild. This leads us to \begin{equation} \int_a^bf(x)dx \approx (b-a) \frac 1N\sum_{i=1}^nf(x_i) \end{equation} Statistical theory, that we will not go into, tells us that the uncertainty $\sigma_I$ in the integral is related to the standard deviation $\sigma_f$ by \begin{equation} \sigma_I\sim \frac1{\sqrt N}\sigma_f \end{equation} for normal distributions.

10.1.1 What is the attraction?

Top > Motivation > What is the attraction?

So far, Monte Carlo integration does not look much different from classical integration by Riemann sums . The difference appears when we go to higher dimensions. In that case, for classical integration we would need $N$ points in each dimension, leading to $N^d$ points in $d$ dimensions. In the Monte Carlo method, on the other hand, the points are taken at random from the $d$-dimensional space, and a much lower number of points suffices.

Computationally, Monte Carlo methods are attractive since all function evaluations can be performed in parallel.

The statistical law that underlies this is as follows: if $N$ independent observations are made of a quantity with standard deviation $\sigma$, then the standard deviation of the mean is $\sigma/\sqrt N$. This means that more observations will lead to more accuracy; what makes Monte Carlo methods interesting is that this gain in accuracy is not related to dimensionality of the original problem.

Monte Carlo techniques are of course natural candidatates for simulating phenomena that are statistical in nature, such as radioactive decay, or Brownian motion. Other problems where Monte Carlo simulation is attractive are outside the realm of scientific computing. For instance, the Black-Scholes model for stock option pricing   [BlackScholes] uses Monte Carlo simulation.

Some problems that you have seen before, such as solving a linear system of equations, can be tackled with Monte Carlo techniques. However, this is not a typical application. Below we will discuss two applications where exact methods would take far too much time to compute and where statistical sampling can quickly give a reasonably accurate answer.

An important tool for statistical sampling is a random number generator. We start by briefly discussing the problems in generating random numbers, especially in parallel.

10.2 Parallel Random Number Generation

Top > Parallel Random Number Generation

Random numbers are often used in simulations as some examples below will show. True random numbers are very hard to obtain: they could be generated by measuring quantum processes such as radioactive particles. Starting with the Intel Ivy Bridge , Intel's processors have a hardware random number generator based on thermal noise  [Cryptography:IvyRandom2012] .)

The most common solution is to use pseudo-random numbers \index{pseudo-random numbers|see{random numbers}}. This means that we use a deterministic mathematical process, that is sufficiently irregular that for practical purposes no order can be found in it.

10.2.1 Sequential random number generators

Top > Parallel Random Number Generation > Sequential random number generators

An easy way to generate random numbers (we leave off the pseudo' qualification) is to use linear congruential generators (for all you ever need to know about random numbers, see Knuth  [Knuth:vol2] ), recurrences of the form \begin{equation} x_{k+1} = (ax_k+b) \mod m. \end{equation} This sequence is periodic, since it consists of nonnegative integers at most $m-1$, and with period $m$ under certain conditions. A typical period is $2^{31}$. The starting point $x_0$ of the series is known as the seed'. Software for random numbers often lets you specify the seed. To get reproducible results you would run your program with the same seed multiple times; to get random behaviour over multiple runs of your program you could for instance derive the seed from clock and calendar functions.

Linear congruential generators may have some amount of correlation between lower bits. A different principle of generating random numbers is the lagged Fibonacci random number generator \begin{equation} X_i = X_{i-p}\otimes X_{i-q} \end{equation} where $p,q$ are the lag parameter, and $\otimes$ is any binary operation, such as addition or multiplication modulo $M$.

The main problems with lagged Fibonacci generators are:

• They require setting $\max(p,q)$ initial values, and their randomness is sensitive to these choices;

• They theory is not as developed as for congruential generators, so their is a greater reliance on statistical tests to evaluate their randomness'.

10.2.2 Parallel random number generation

Top > Parallel Random Number Generation > Parallel random number generation

Random number generation is problematic in parallel. To see this, consider a parallel process that uses a random number generator on each subprocess, and consider a single processor emulating the parallel process. Now this single process in effect has a random number generator that consists of interleaving the parallel generator results. This means that, if we use the same generator in all parallel processes, the effective generator over the whole process will produce stretches of identical values.

There are various ways out.

10.2.2.1 Master-worker generator

Top > Parallel Random Number Generation > Parallel random number generation > Master-worker generator

We can generate the random numbers centrally. In shared memory that could mean making its operation atomic. This may introduce a serious bottleneck.

Exercise Critical sections are usually justified if the work spent there is of lower order than the parallel work. Why does that argument not apply here.

Another solution would be to have one thread or process that generates the random numbers and distributes them to the other processes. Doing this on a number-by-number basis causes considerable overhead. Instead, it would be possible for the generator process to distribute blocks of numbers. However, the manner in which this is done may again cause correlation between processes.

10.2.2.2 Sequence splitting solutions

Top > Parallel Random Number Generation > Parallel random number generation > Sequence splitting solutions

A better solution is to set up independent generators with parameter choices that guarantee statistical randomness. This is not simple. For instance, if two sequences $x^{(1)}_i,x^{(2)}_i$ have the same values of $a,b,m$, and their starting points are close together, the sequences will be strongly correlated. Less trivial examples of correlation exist.

Various techniques for random number generation exist, such as using two sequences, where one generates the starting points for the other sequence, which is the one actually used in the simulation. Software for parallel random number generator can be found at http://sprng.cs.fsu.edu/   [Mascagni:SPRNG] .

If it is possible given $x_i$ to compute $x_{i+k}$ cheaply, one use a leapfrogging technique, where $k$ processes have disjoint series $i\mapsto x_{s_k+ik}$ where $x_{s_k}$ is the starting point for the $k$-th series.

10.2.2.3 Blocked random number generators

Top > Parallel Random Number Generation > Parallel random number generation > Blocked random number generators

Some random number generators (see  [LEcuyer:multiple-random] ) allow you to calculate a value that is many iteration away from the seed. You could then take the block of values from the seed to that iteration and give it to one processor. Similarly, each processor would get a contiguous block of iterations of the generator.

10.2.2.4 Mersenne twister

Top > Parallel Random Number Generation > Parallel random number generation > Mersenne twister

The Mersenne twister random number generator has been adapted to allow for parallel streams of uncorrelated numbers  [Matsumoto:DynamicMersenne] . Here the process ID is encoded into the generator.

10.2.3 Examples

Top > Examples

10.2.4 Monte Carlo simulation of the Ising model

Top > Examples > Monte Carlo simulation of the Ising model

The Ising model (for an introduction, see  [Cipra:Ising] ) was originally proposed to model ferromagnetism. Magnetism is the result of atoms aligning their spin' direction: let's say spin can only be up' or down', then a material has magnetism if more atoms have spin up than down, or the other way around. The atoms are said to be in a structure called a lattice'.

Now image heating up a material, which loosens up the atoms. If an external field is applied to the material, the atoms will start aligning with the field, and if the field is removed the magnetism disappears again. However, below a certain critical temperature the material will retain its magnetism. We will use Monte Carlo simulation to find the stable configurations that remain.

Let's say the lattice $\Lambda$ has $N$ atoms, and we denote a configuration of atoms as $\sigma=(\sigma_1,\ldots,\sigma_N)$ where each $\sigma_i=\pm1$. The energy of a lattice is modeled as \begin{equation} H=H(\sigma)=-J\sum_i\sigma_i-E\sum_{ij}\sigma_i\sigma_j. \end{equation} The first term models the interaction of individual spins $\sigma_i$ with an external field of strength $J$. The second term, which sums over nearest neighbour pairs, models alignment of atom pairs: the product $\sigma_i\sigma_j$ is positive if the atoms have identical spin, and negative if opposite.

In statistical mechanics , the probability of a configuration is \begin{equation} P(\sigma) = \exp(-H(\sigma))/Z \end{equation} where the `partitioning function' $Z$ is defined as \begin{equation} Z = \sum_\sigma \exp(H(\sigma)) \end{equation} where the sum runs over all $2^N$ configurations.

A configuration is stable if its energy does not decrease under small perturbations. To explore this, we iterate over the lattice, exploring whether altering the spin of atoms lowers the energy. We introduce an element of chance to prevent artificial solutions. (This is the Metropolis algorithm   [Metropolis] .)

Algorithm: For: fixed number of iterations{For: each atom $i${ {calculate the change $\Delta E$ from changing the sign of $\sigma_i$}If: $\Delta E<$ or $\exp(-\Delta E)$ greater than some random number{ accept the change} } }

This algorithm can be parallelized, if we notice the similarity with the structure of the sparse matrix-vector product. In that algorithm too we compute a local quantity by combining inputs from a few nearest neighbours. This means we can partitioning the lattice, and compute the local updates after each processor collects a \indexterm{ghost region}.

Having each processor iterate over local points in the lattice corresponds to a particular global ordering of the lattice; to make the parallel computation equivalent to a sequential one we also need a parallel random generator (section  ).