14 KiB
Exercize 5
The following integral must be evaluated:
I = \int\limits_0^1 dx \, e^x
\begin{figure} \hypertarget{fig:exp}{% \centering \begin{tikzpicture} \definecolor{cyclamen}{RGB}{146, 24, 43} % Integral \filldraw [cyclamen!15!white, domain=0:5, variable=\x] (0,0) -- plot({\x},{exp(\x/5)}) -- (5,0) -- cycle; \draw [cyclamen] (5,0) -- (5,2.7182818); \node [below] at (5,0) {1}; % Axis \draw [thick, <-] (0,4) -- (0,0); \draw [thick, ->] (-2,0) -- (7,0); \node [below right] at (7,0) {$x$}; \node [above left] at (0,4) {$e^{x}$}; % Plot \draw [domain=-2:7, smooth, variable=\x, cyclamen, ultra thick] plot ({\x},{exp(\x/5)}); \end{tikzpicture} \caption{Plot of the integral to be evaluated.} } \end{figure}
whose exact value is 1.7182818285...
The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser and Vegas. Besides this popularity fact, these three method were chosen for being the only ones implemented in the GSL library.
Plain Monte Carlo
When the integral I
over a $n-$dimensional space \Omega
of volume V
of a
function f
must be evaluated, that is:
I = \int\limits_{\Omega} dx \, f(x)
\with V = \int\limits_{\Omega} dx
the simplest MC method approach is to sample N
points x_i
evenly distributed
in V
and approx I
as:
I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle
with I_N \rightarrow I
for N \rightarrow + \infty
for the law of large
numbers. Hence, the sample variance can be extimated by the sample variance:
\sigma^2_f = \frac{1}{N - 1} \sum_{i = 1}^N \left( f(x_i) - \langle f
\rangle \right)^2 \et \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N
\sigma^2_f = \frac{V^2}{N} \sigma^2_f
Thus, the error decreases as 1/\sqrt{N}
.
Unlike in deterministic methods, the estimate of the error is not a strict error
bound: random sampling may not uncover all the important features of the
integrand and this can result in an underestimate of the error.
In this case, f(x) = e^{x}
and \Omega = [0,1]
.
Since the proximity of I_N
to I
is related to N
, the accuracy of the
method is determined by how many points are generated, namely how many function
calls are exectuted when the method is implemented. In @tbl:MC, the obtained
results and errors \sigma
are shown. The estimated integrals for different
numbers of calls are compared to the expected value I
and the difference
'diff' between them is given.
As can be seen, the MC method tends to underestimate the error for scarse
function calls. As previously stated, the higher the number of function calls,
the better the estimation of I
. A further observation regards the fact that,
even with 50'000'000
calls, the I^{\text{oss}}
still differs from I
at
the fifth decimal digit.
500'000 calls 5'000'000 calls 50'000'000 calls
I^{\text{oss}}
1.7166435813 1.7181231109 1.7183387184
\sigma
0.0006955691 0.0002200309 0.0000695809
diff 0.0016382472 0.0001587176 0.0000568899
Table: MC results with different numbers of function calls. {#tbl:MC}
Stratified sampling
In statistics, stratified sampling is a method of sampling from a population
partitioned into subpopulations. Stratification, indeed, is the process of
dividing the primary sample into subgroups (strata) before sampling random
within each stratum.
Given the mean \bar{x}_i
and variance {\sigma^2_x}_i
of an entity $x$
sorted with simple random sampling in each strata, such as:
\bar{x}_i = \frac{1}{n_i} \sum_j x_j
\sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( x_j - \bar{x}_i \right)^2
\thus
{\sigma^2_x}_i = \frac{1}{n_i^2} \sum_j \sigma_i^2 = \frac{\sigma_i^2}{n_i}
where:
j
runs over the pointsx_j
sampled in thei^{\text{th}}
stratumn_i
is the number of points sorted in it\sigma_i^2
is the variance associated with thej^{\text{th}}
point
then the mean \bar{x}
and variance \sigma_x^2
estimated with stratified
sampling for the whole population are:
\bar{x} = \frac{1}{N} \sum_i N_i \bar{x}_i \et
\sigma_x^2 = \sum_i \left( \frac{N_i}{N} \right)^2 {\sigma_x}^2_i
= \sum_i \left( \frac{N_i}{N} \right)^2 \frac{\sigma^2_i}{n_i}
where i
runs over the strata, N_i
is the weight of the $i^{\text{th}}$
stratum and N
is the sum of all strata weights.
In practical terms, it can produce a weighted mean that has less variability
than the arithmetic mean of a simple random sample of the whole population. In
fact, if measurements within strata have lower standard deviation, the final
result will have a smaller error in estimation with respect to the one otherwise
obtained with simple sampling.
For this reason, stratified sampling is used as a method of variance reduction
when MC methods are used to estimate population statistics from a known
population.
MISER
The MISER technique aims to reduce the integration error through the use of
recursive stratified sampling.
As stated before, according to the law of large numbers, for a large number of
extracted points, the estimation of the integral I
can be computed as:
I= V \cdot \langle f \rangle
Since V
is known (in this case, V = 1
), it is sufficient to estimate
\langle f \rangle
.
Consider two disjoint regions a
and b
, such that a \cup b = \Omega
, in
which n_a
and n_b
points were uniformely sampled. Given the Monte Carlo
estimates of the means \langle f \rangle_a
and \langle f \rangle_b
of those
points and their variances \sigma_a^2
and \sigma_b^2
, if the weights $N_a$
and N_b
of \langle f \rangle_a
and \langle f \rangle_b
are chosen unitary,
then the variance \sigma^2
of the combined estimate \langle f \rangle
:
\langle f \rangle = \frac{1}{2} \left( \langle f \rangle_a
+ \langle f \rangle_b \right)
is given by:
\sigma^2 = \frac{\sigma_a^2}{4n_a} + \frac{\sigma_b^2}{4n_b}
It can be shown that this variance is minimized by distributing the points such that:
\frac{n_a}{n_a + n_b} = \frac{\sigma_a}{\sigma_a + \sigma_b}
Hence, the smallest error estimate is obtained by allocating sample points in
proportion to the standard deviation of the function in each sub-region.
The whole integral estimate and its variance are therefore given by:
I = V \cdot \langle f \rangle \et \sigma_I^2 = V^2 \cdot \sigma^2
When implemented, MISER is in fact a recursive method. With a given step, all
the possible bisections are tested and the one which minimizes the combined
variance of the two sub-regions is selected. The variance in the sub-regions is
estimated with a fraction of the total number of available points. The remaining
sample points are allocated to the sub-regions using the formula for n_a
and
n_b
, once the variances are computed.
The same procedure is then repeated recursively for each of the two half-spaces
from the best bisection. At each recursion step, the integral and the error are
estimated using a plain Monte Carlo algorithm. After a given number of calls,
the final individual values and their error estimates are then combined upwards
to give an overall result and an estimate of its error.
Results for this particular sample are shown in @tbl:MISER.
500'000 calls 5'000'000 calls 50'000'000 calls
I^{\text{oss}}
1.7182850738 1.7182819143 1.7182818221
\sigma
0.0000021829 0.0000001024 0.0000000049
diff 0.0000032453 0.0000000858 000000000064
Table: MISER results with different numbers of function calls. Be careful: while in @tbl:MC the number of function calls stands for the number of total sampled poins, in this case it stands for the times each section is divided into subsections. {#tbl:MISER}
This time the error, altough it lies always in the same order of magnitude of diff, seems to seesaw around the correct value.
Importance sampling
In statistics, importance sampling is a method which samples points from the
probability distribution f
itself, so that the points cluster in the regions
that make the largest contribution to the integral.
Remind that I = V \cdot \langle f \rangle
and therefore only $\langle f
\rangle$ must be estimated. Then, consider a sample of n
points {$x_i$}
generated according to a probability distribition function P
which gives
thereby the following expected value:
E [x, P] = \frac{1}{n} \sum_i x_i
with variance:
\sigma^2 [E, P] = \frac{\sigma^2 [x, P]}{n}
\with \sigma^2 [x, P] = \frac{1}{n -1} \sum_i \left( x_i - E [x, P] \right)^2
where i
runs over the sample.
In the case of plain MC, \langle f \rangle
is estimated as the expected
value of points {$f(x_i)$} sorted with P (x_i) = 1 \quad \forall i
, since they
are evenly distributed in \Omega
. The idea is to sample points from a
different distribution to lower the variance of E[x, P]
, which results in
lowering \sigma^2 [x, P]
. This is accomplished by choosing a random variable
y
and defining a new probability P^{(y)}
in order to satisfy:
E [x, P] = E \left[ \frac{x}{y}, P^{(y)} \right]
which is to say:
I = \int \limits_{\Omega} dx f(x) =
\int \limits_{\Omega} dx \, \frac{f(x)}{g(x)} \, g(x)=
\int \limits_{\Omega} dx \, w(x) \, g(x)
where E \, \longleftrightarrow \, I
and:
\begin{cases}
f(x) \, \longleftrightarrow \, x \\
1 \, \longleftrightarrow \, P
\end{cases}
\et
\begin{cases}
w(x) \, \longleftrightarrow \, \frac{x}{y} \\
g(x) \, \longleftrightarrow \, y = P^{(y)}
\end{cases}
Where the symbol \longleftrightarrow
points out the connection between the
variables. This new estimate is better than the former if:
\sigma^2 \left[ \frac{x}{y}, P^{(y)} \right] < \sigma^2 [x, P]
The best variable y
would be:
y^{\star} = \frac{x}{E [x, P]} \, \longleftrightarrow \, \frac{f(x)}{I}
\thus \frac{x}{y^{\star}} = E [x, P]
and even a single sample under P^{(y^{\star})}
would be sufficient to give its
value. Obviously, it is not possible to take exactly this choice, since $E [x,
P]$ is not given a priori.
However, this gives an insight into what importance sampling does. In fact,
given that:
E [x, P] = \int \limits_{a = - \infty}^{a = + \infty}
a P(x \in [a, a + da])
the best probability change P^{(y^{\star})}
redistributes the law of x
so
that its samples frequencies are sorted directly according to their weights in
E[x, P]
, namely:
P^{(y^{\star})}(x \in [a, a + da]) = \frac{1}{E [x, P]} a P (x \in [a, a + da])
VEGAS \textcolor{red}{WIP}
The VEGAS algorithm is based on importance sampling. It samples points from the
probability distribution described by the function f
, so that the points are
concentrated in the regions that make the largest contribution to the integral.
In general, if the MC integral of f
is sampled with points distributed
according to a probability distribution g
, the following estimate of the integral
is obtained:
E (f|g \, , \, N) \with \sigma^2(f|g \, , \, N)
If the probability distribution is chosen as g = f
, it can be shown that the
variance vanishes, and the error in the estimate will therefore be zero.
In practice, it is impossible to sample points from the exact distribution: only
a good approximation can be achieved. In GSL, the VEGAS algorithm approximates
the distribution by histogramming the function f
in different subregions. Each
histogram is used to define a sampling distribution for the next pass, which
consists in doing the same thing recorsively: this procedure converges
asymptotically to the desired distribution.
In order to avoid the number of histogram bins growing like K^d
, the
probability distribution is approximated by a separable function:
f (x_1, x_2, \ldots) = f_1(x_1) f_2(x_2) \ldots
so that the number of bins required is only Kd
. This is equivalent to locating
the peaks of the function from the projections of the integrand onto the
coordinate axes. The efficiency of VEGAS depends on the validity of this
assumption. It is most efficient when the peaks of the integrand are
well-localized. If an integrand can be rewritten in a form which is
approximately separable this will increase the efficiency of integration with
VEGAS.
VEGAS incorporates a number of additional features, and combines both stratified sampling and importance sampling. The integration region is divided into a number of “boxes”, with each box getting a fixed number of points (the goal is 2). Each box can then have a fractional number of bins, but if the ratio of bins-per-box is less than two, Vegas switches to a kind variance reduction (rather than importance sampling).
calls plain MC Miser Vegas
500'000 1.7166435813 1.7182850738 1.7182818354
5'000'000 1.7181231109 1.7182819143 1.7182818289
50'000'000 1.7183387184 1.7182818221 1.7182818285
Table: Results of the three methods. {#tbl:results}
calls plain MC Miser Vegas
500'000 0.0006955691 0.0000021829 0.0000000137
5'000'000 0.0002200309 0.0000001024 0.0000000004
50'000'000 0.0000695809 0.0000000049 0.0000000000
Table: $\sigma$s of the three methods. {#tbl:sigmas}