4.0 KiB
Exercize 5
Numerically compute an integral value via Monte Carlo approaches
The integral to be evaluated is the following:
I = \int\limits_0^1 dx \, e(x)
whose exact value is 1.7182818285...
The three most popular MC methods where applied: plain Monte Carlo, Miser and
Vegas (besides this popularity fact, these three method were chosen for being
the only ones implemented in the GSL library).
Plain Monte Carlo
When the integral I
over a volume V
of a function f
must be evaluated in
a $n-$dimensional space, the simplest MC method approach is to sample $N$
points x_i
evenly distributed in V
and approx I
as:
I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle
with I_N \rightarrow I
for N \rightarrow + \infty
for the law of large
numbers. Hence, the variance can be merely extimated as:
\sigma^2 =
\frac{1}{N-1} \sum_{i = 1}^N \left( f(x_i) - \langle f \rangle \right)^2
thus, the error on I_N
decreases as 1/\sqrt{N}
. Unlike in deterministic
methods, the estimate of the error is not a strict error bound: random sampling
may not uncover all the important features of the integrand that can result in
an underestimate of the error.
Since the proximity of I_N
to I
is related to N
, the accuracy of the
method is determined by the number of function calls when implemented (proof
in @tbl:MC).
500'000 calls 5'000'000 calls 50'000'000 calls
result 1.7166435813 1.7181231109 1.7183387184
\sigma
0.0006955691 0.0002200309 0.0000695809
Table: MC results and errors with different numbers of function calls. {#tbl:MC}
Miser
The MISER algorithm is based on recursive stratified sampling.
On each recursion step the integral and the error are estimated using a plain
Monte Carlo algorithm. If the error estimate is larger than the required
accuracy, the integration volume is divided into sub-volumes and the procedure
is recursively applied to sub-volumes.
This technique aims to reduce the overall integration error by concentrating
integration points in the regions of highest variance.
The idea of stratified sampling begins with the observation that for two
disjoint regions a
and b
with Monte Carlo estimates of the integral
E_a (f)
and E_b (f)
and variances \sigma_a^2 (f)
and \sigma_b^2 (f)
,
the variance V (f)
of the combined estimate E (f)
:
E (f)= \frac {1}{2} \left( E_a (f) + E_b (f) \right)
is given by,
V(f) = \frac{\sigma_a^2 (f)}{4 N_a} + \frac{\sigma_b^2 (f)}{4 N_b}
It can be shown that this variance is minimized by distributing the points such that,
\frac{N_a}{N_a + N_b} = \frac{\sigma_a}{\sigma_a + \sigma_b}
\cdot \frac{N_a}{N_a + N_b}
= \frac{\sigma_a}{\sigma_a + \sigma b}
Hence the smallest error estimate is obtained by allocating sample points in proportion to the standard deviation of the function in each sub-region.
calls plain MC Miser Vegas
500'000 1.7166435813 1.7182850738 1.7182818354
5'000'000 1.7181231109 1.7182819143 1.7182818289
50'000'000 1.7183387184 1.7182818221 1.7182818285
Table: Results of the three methods. {#tbl:results}
calls plain MC Miser Vegas
500'000 0.0006955691 0.0000021829 0.0000000137
5'000'000 0.0002200309 0.0000001024 0.0000000004
50'000'000 0.0000695809 0.0000000049 0.0000000000
Table: $\sigma$s of the three methods. {#tbl:sigmas}