357 lines
13 KiB
Markdown
357 lines
13 KiB
Markdown
# Exercize 5
|
|
|
|
The following integral must be evaluated:
|
|
|
|
$$
|
|
I = \int\limits_0^1 dx \, e^x
|
|
$$
|
|
|
|
\begin{figure}
|
|
\hypertarget{fig:exp}{%
|
|
\centering
|
|
\begin{tikzpicture}
|
|
\definecolor{cyclamen}{RGB}{146, 24, 43}
|
|
% Integral
|
|
\filldraw [cyclamen!15!white, domain=0:5, variable=\x]
|
|
(0,0) -- plot({\x},{exp(\x/5)}) -- (5,0) -- cycle;
|
|
\draw [cyclamen] (5,0) -- (5,2.7182818);
|
|
\node [below] at (5,0) {1};
|
|
% Axis
|
|
\draw [thick, <-] (0,4) -- (0,0);
|
|
\draw [thick, ->] (-2,0) -- (7,0);
|
|
\node [below right] at (7,0) {$x$};
|
|
\node [above left] at (0,4) {$e^{x}$};
|
|
% Plot
|
|
\draw [domain=-2:7, smooth, variable=\x,
|
|
cyclamen, ultra thick] plot ({\x},{exp(\x/5)});
|
|
\end{tikzpicture}
|
|
\caption{Plot of the integral to be evaluated.}
|
|
}
|
|
\end{figure}
|
|
|
|
whose exact value is 1.7182818285...
|
|
|
|
The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser
|
|
and Vegas. Besides this popularity fact, these three method were chosen for
|
|
being the only ones implemented in the GSL library.
|
|
|
|
|
|
## Plain Monte Carlo
|
|
|
|
When the integral $I$ over a $n-$dimensional space $\Omega$ of volume $V$ of a
|
|
function $f$ must be evaluated, that is:
|
|
|
|
$$
|
|
I = \int\limits_{\Omega} dx \, f(x)
|
|
\with V = \int\limits_{\Omega} dx
|
|
$$
|
|
|
|
the simplest MC method approach is to sample $N$ points $x_i$ evenly distributed
|
|
in $V$ and approx $I$ as:
|
|
|
|
$$
|
|
I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle
|
|
$$
|
|
|
|
with $I_N \rightarrow I$ for $N \rightarrow + \infty$ for the law of large
|
|
numbers. Hence, the sample variance can be extimated by the sample variance:
|
|
|
|
$$
|
|
\sigma^2_f = \frac{1}{N - 1} \sum_{i = 1}^N \left( f(x_i) - \langle f
|
|
\rangle \right)^2 \et \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N
|
|
\sigma^2_f = \frac{V^2}{N} \sigma^2_f
|
|
$$
|
|
|
|
Thus, the error decreases as $1/\sqrt{N}$.
|
|
Unlike in deterministic methods, the estimate of the error is not a strict error
|
|
bound: random sampling may not uncover all the important features of the
|
|
integrand and this can result in an underestimate of the error.
|
|
|
|
In this case, $f(x) = e^{x}$ and $\Omega = [0,1]$.
|
|
|
|
Since the proximity of $I_N$ to $I$ is related to $N$, the accuracy of the
|
|
method is determined by how many points are generated, namely how many function
|
|
calls are exectuted when the method is implemented. In @tbl:MC, the obtained
|
|
results and errors $\sigma$ are shown. The estimated integrals for different
|
|
numbers of calls are compared to the expected value $I$ and the difference
|
|
'diff' between them is given.
|
|
As can be seen, the MC method tends to underestimate the error for scarse
|
|
function calls. As previously stated, the higher the number of function calls,
|
|
the better the estimation of $I$. A further observation regards the fact that,
|
|
even with $50'000'000$ calls, the $I^{\text{oss}}$ still differs from $I$ at
|
|
the fifth decimal digit.
|
|
|
|
-------------------------------------------------------------------------
|
|
500'000 calls 5'000'000 calls 50'000'000 calls
|
|
----------------- ----------------- ------------------ ------------------
|
|
$I^{\text{oss}}$ 1.7166435813 1.7181231109 1.7183387184
|
|
|
|
$\sigma$ 0.0006955691 0.0002200309 0.0000695809
|
|
|
|
diff 0.0016382472 0.0001587176 0.0000568899
|
|
-------------------------------------------------------------------------
|
|
|
|
Table: MC results with different numbers of function calls. {#tbl:MC}
|
|
|
|
|
|
## Stratified sampling
|
|
|
|
In statistics, stratified sampling is a method of sampling from a population
|
|
partitioned into subpopulations. Stratification, indeed, is the process of
|
|
dividing the primary sample into subgroups (strata) before sampling random
|
|
within each stratum.
|
|
Given the mean $\bar{x}_i$ and variance ${\sigma^2_x}_i$ of an entity $x$
|
|
sorted with simple random sampling in each strata, such as:
|
|
|
|
$$
|
|
\bar{x}_i = \frac{1}{n_i} \sum_j x_j
|
|
$$
|
|
|
|
$$
|
|
\sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( x_j - \bar{x}_i \right)^2
|
|
\thus
|
|
{\sigma^2_x}_i = \frac{1}{n_i^2} \sum_j \sigma_i^2 = \frac{\sigma_i^2}{n_i}
|
|
$$
|
|
|
|
where:
|
|
|
|
- $j$ runs over the points $x_j$ sampled in the $i^{\text{th}}$ stratum
|
|
- $n_i$ is the number of points sorted in it
|
|
- $\sigma_i^2$ is the variance associated with the $j^{\text{th}}$ point
|
|
|
|
then the mean $\bar{x}$ and variance $\sigma_x^2$ estimated with stratified
|
|
sampling for the whole population are:
|
|
|
|
$$
|
|
\bar{x} = \frac{1}{N} \sum_i N_i \bar{x}_i \et
|
|
\sigma_x^2 = \sum_i \left( \frac{N_i}{N} \right)^2 {\sigma_x}^2_i
|
|
= \sum_i \left( \frac{N_i}{N} \right)^2 \frac{\sigma^2_i}{n_i}
|
|
$$
|
|
|
|
where $i$ runs over the strata, $N_i$ is the weight of the $i^{\text{th}}$
|
|
stratum and $N$ is the sum of all strata weights.
|
|
|
|
In practical terms, it can produce a weighted mean that has less variability
|
|
than the arithmetic mean of a simple random sample of the whole population. In
|
|
fact, if measurements within strata have lower standard deviation, the final
|
|
result will have a smaller error in estimation with respect to the one otherwise
|
|
obtained with simple sampling.
|
|
For this reason, stratified sampling is used as a method of variance reduction
|
|
when MC methods are used to estimate population statistics from a known
|
|
population.
|
|
|
|
|
|
### MISER
|
|
|
|
The MISER technique aims to reduce the integration error through the use of
|
|
recursive stratified sampling.
|
|
As stated before, according to the law of large numbers, for a large number of
|
|
extracted points, the estimation of the integral $I$ can be computed as:
|
|
|
|
$$
|
|
I= V \cdot \langle f \rangle
|
|
$$
|
|
|
|
|
|
Since $V$ is known (in this case, $V = 1$), it is sufficient to estimate
|
|
$\langle f \rangle$.
|
|
|
|
Consider two disjoint regions $a$ and $b$, such that $a \cup b = \Omega$, in
|
|
which $n_a$ and $n_b$ points were uniformely sampled. Given the Monte Carlo
|
|
estimates of the means $\langle f \rangle_a$ and $\langle f \rangle_b$ of those
|
|
points and their variances $\sigma_a^2$ and $\sigma_b^2$, if the weights $N_a$
|
|
and $N_b$ of $\langle f \rangle_a$ and $\langle f \rangle_b$ are chosen unitary,
|
|
then the variance $\sigma^2$ of the combined estimate $\langle f \rangle$:
|
|
|
|
$$
|
|
\langle f \rangle = \frac{1}{2} \left( \langle f \rangle_a
|
|
+ \langle f \rangle_b \right)
|
|
$$
|
|
|
|
is given by:
|
|
|
|
$$
|
|
\sigma^2 = \frac{\sigma_a^2}{4n_a} + \frac{\sigma_b^2}{4n_b}
|
|
$$
|
|
|
|
It can be shown that this variance is minimized by distributing the points such
|
|
that:
|
|
|
|
$$
|
|
\frac{n_a}{n_a + n_b} = \frac{\sigma_a}{\sigma_a + \sigma_b}
|
|
$$
|
|
|
|
Hence, the smallest error estimate is obtained by allocating sample points in
|
|
proportion to the standard deviation of the function in each sub-region.
|
|
The whole integral estimate and its variance are therefore given by:
|
|
|
|
$$
|
|
I = V \cdot \langle f \rangle \et \sigma_I^2 = V^2 \cdot \sigma^2
|
|
$$
|
|
|
|
When implemented, MISER is in fact a recursive method. With a given step, all
|
|
the possible bisections are tested and the one which minimizes the combined
|
|
variance of the two sub-regions is selected. The variance in the sub-regions is
|
|
estimated with a fraction of the total number of available points. The remaining
|
|
sample points are allocated to the sub-regions using the formula for $n_a$ and
|
|
$n_b$, once the variances are computed.
|
|
The same procedure is then repeated recursively for each of the two half-spaces
|
|
from the best bisection. At each recursion step, the integral and the error are
|
|
estimated using a plain Monte Carlo algorithm. After a given number of calls,
|
|
the final individual values and their error estimates are then combined upwards
|
|
to give an overall result and an estimate of its error.
|
|
|
|
Results for this particular sample are shown in @tbl:MISER.
|
|
|
|
-------------------------------------------------------------------------
|
|
500'000 calls 5'000'000 calls 50'000'000 calls
|
|
----------------- ----------------- ------------------ ------------------
|
|
$I^{\text{oss}}$ 1.7182850738 1.7182819143 1.7182818221
|
|
|
|
$\sigma$ 0.0000021829 0.0000001024 0.0000000049
|
|
|
|
diff 0.0000032453 0.0000000858 000000000064
|
|
-------------------------------------------------------------------------
|
|
|
|
Table: MISER results with different numbers of function calls. Be careful:
|
|
while in @tbl:MC the number of function calls stands for the number of
|
|
total sampled poins, in this case it stands for the times each section
|
|
is divided into subsections. {#tbl:MISER}
|
|
|
|
This time the error, altough it lies always in the same order of magnitude of
|
|
diff, seems to seesaw around the correct value, which is much more closer to
|
|
the expected one.
|
|
|
|
|
|
## Importance sampling
|
|
|
|
In statistics, importance sampling is a method which samples points from the
|
|
probability distribution $f$ itself, so that the points cluster in the regions
|
|
that make the largest contribution to the integral.
|
|
|
|
Remind that $I = V \cdot \langle f \rangle$ and therefore only $\langle f
|
|
\rangle$ must be estimated. Then, consider a sample of $n$ points {$x_i$}
|
|
generated according to a probability distribition function $P$ which gives
|
|
thereby the following expected value:
|
|
|
|
$$
|
|
E [x, P] = \frac{1}{n} \sum_i x_i
|
|
$$
|
|
|
|
with variance:
|
|
|
|
$$
|
|
\sigma^2 [E, P] = \frac{\sigma^2 [x, P]}{n}
|
|
\with \sigma^2 [x, P] = \frac{1}{n -1} \sum_i \left( x_i - E [x, P] \right)^2
|
|
$$
|
|
|
|
where $i$ runs over the sample.
|
|
In the case of plain MC, $\langle f \rangle$ is estimated as the expected
|
|
value of points {$f(x_i)$} sorted with $P (x_i) = 1 \quad \forall i$, since they
|
|
are evenly distributed in $\Omega$. The idea is to sample points from a
|
|
different distribution to lower the variance of $E[x, P]$, which results in
|
|
lowering $\sigma^2 [x, P]$. This is accomplished by choosing a random variable
|
|
$y$ and defining a new probability $P^{(y)}$ in order to satisfy:
|
|
|
|
$$
|
|
E [x, P] = E \left[ \frac{x}{y}, P^{(y)} \right]
|
|
$$
|
|
|
|
which is to say:
|
|
|
|
$$
|
|
I = \int \limits_{\Omega} dx f(x) =
|
|
\int \limits_{\Omega} dx \, \frac{f(x)}{g(x)} \, g(x)=
|
|
\int \limits_{\Omega} dx \, w(x) \, g(x)
|
|
$$
|
|
|
|
where $E \, \longleftrightarrow \, I$ and:
|
|
|
|
$$
|
|
\begin{cases}
|
|
f(x) \, \longleftrightarrow \, x \\
|
|
1 \, \longleftrightarrow \, P
|
|
\end{cases}
|
|
\et
|
|
\begin{cases}
|
|
w(x) \, \longleftrightarrow \, \frac{x}{y} \\
|
|
g(x) \, \longleftrightarrow \, y = P^{(y)}
|
|
\end{cases}
|
|
$$
|
|
|
|
Where the symbol $\longleftrightarrow$ points out the connection between the
|
|
variables. This new estimate is better than the former if:
|
|
|
|
$$
|
|
\sigma^2 \left[ \frac{x}{y}, P^{(y)} \right] < \sigma^2 [x, P]
|
|
$$
|
|
|
|
The best variable $y$ would be:
|
|
|
|
$$
|
|
y^{\star} = \frac{x}{E [x, P]} \, \longleftrightarrow \, \frac{f(x)}{I}
|
|
\thus \frac{x}{y^{\star}} = E [x, P]
|
|
$$
|
|
|
|
and even a single sample under $P^{(y^{\star})}$ would be sufficient to give its
|
|
value. Obviously, it is not possible to take exactly this choice, since $E [x,
|
|
P]$ is not given a priori.
|
|
However, this gives an insight into what importance sampling does. In fact,
|
|
given that:
|
|
|
|
$$
|
|
E [x, P] = \int \limits_{a = - \infty}^{a = + \infty}
|
|
a P(x \in [a, a + da])
|
|
$$
|
|
|
|
the best probability change $P^{(y^{\star})}$ redistributes the law of $x$ so
|
|
that its samples frequencies are sorted directly according to their weights in
|
|
$E[x, P]$, namely:
|
|
|
|
$$
|
|
P^{(y^{\star})}(x \in [a, a + da]) = \frac{1}{E [x, P]} a P (x \in [a, a + da])
|
|
$$
|
|
|
|
In conclusion, since certain values of $x$ have more impact on $E [x, P]$ than
|
|
others, these "important" values must be emphasized by sampling them more
|
|
frequently. As a consequence, the estimator variance will be reduced.
|
|
|
|
|
|
### VEGAS
|
|
|
|
|
|
The VEGAS algorithm is based on importance sampling. It aims to reduce the
|
|
integration error by concentrating points in the regions that make the largest
|
|
contribution to the integral.
|
|
|
|
As stated before, in practice it is impossible to sample points from the best
|
|
distribution $P^{(y^{\star})}$: only a good approximation can be achieved. In
|
|
GSL, the VEGAS algorithm approximates the distribution by histogramming the
|
|
function $f$ in different subregions. Each histogram is used to define a
|
|
sampling distribution for the next pass, which consists in doing the same thing
|
|
recorsively: this procedure converges asymptotically to the desired
|
|
distribution. It follows that a better estimation is achieved with a greater
|
|
number of function calls.
|
|
The integration uses a fixed number of function calls. The result and its
|
|
error estimate are based on a weighted average of independent samples, as for
|
|
MISER.
|
|
For this particular sample, results are shown in @tbl:VEGAS.
|
|
|
|
-------------------------------------------------------------------------
|
|
500'000 calls 5'000'000 calls 50'000'000 calls
|
|
----------------- ----------------- ------------------ ------------------
|
|
$I^{\text{oss}}$ 1.7182818354 1.7182818289 1.7182818285
|
|
|
|
$\sigma$ 0.0000000137 0.0000000004 0.0000000000
|
|
|
|
diff 0.0000000069 0.0000000004 0.0000000000
|
|
-------------------------------------------------------------------------
|
|
|
|
Table: VEGAS results with different numbers of
|
|
function calls. {#tbl:VEGAS}
|
|
|
|
This time, the error estimation is notably close to diff for each number of
|
|
function calls, meaning that the estimation of both the integral and its
|
|
error turn out to be very accurate, much more than the ones obtained with
|
|
both plain Monte Carlo method and stratified sampling.
|