ex-5: review

This commit is contained in:
Giù Marcer 2020-06-03 18:51:11 +02:00 committed by rnhmjoj
parent 3f6a54c456
commit 3688bceaf2

View File

@ -2,12 +2,15 @@
The following integral is to be evaluated comparing different Monte Carlo The following integral is to be evaluated comparing different Monte Carlo
techniques. techniques.
\vspace{20pt}
$$
I = \int_0^1 \!\!\! dx \, e^x
$$
\vspace{-70pt}
\begin{figure} \begin{figure}
\hypertarget{fig:exp}{% \hypertarget{fig:exp}{%
\centering \centering
\begin{tikzpicture} \begin{tikzpicture}
\definecolor{cyclamen}{RGB}{146, 24, 43}
% Integral % Integral
\filldraw [cyclamen!15!white, domain=0:5, variable=\x] \filldraw [cyclamen!15!white, domain=0:5, variable=\x]
(0,0) -- plot({\x},{exp(\x/5)}) -- (5,0) -- cycle; (0,0) -- plot({\x},{exp(\x/5)}) -- (5,0) -- cycle;
@ -21,19 +24,16 @@ techniques.
% Plot % Plot
\draw [domain=-2:7, smooth, variable=\x, \draw [domain=-2:7, smooth, variable=\x,
cyclamen, ultra thick] plot ({\x},{exp(\x/5)}); cyclamen, ultra thick] plot ({\x},{exp(\x/5)});
% Equation
\node [above] at (2.5, 2.5) {$I = \int\limits_0^1 dx \, e^x$};
\end{tikzpicture} \end{tikzpicture}
\caption{Plot of the integral to be evaluated.} \caption{Plot of the integral to be evaluated.}
} }
\end{figure} \end{figure}
whose exact value is 1.7182818285... whose exact value is 1.7182818285...
The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser
and Vegas. Besides being commonly used, these were chosen for also being and Vegas. Besides being commonly used, these were chosen for also being
implemented in the GSL libraries `gsl_monte_plain`, `gsl_monte_miser` and implemented in the GSL libraries `gsl_monte_plain`, `gsl_monte_miser` and
`gsl_monte_vegas`, respectively. `gsl_monte_vegas`.
## Plain Monte Carlo ## Plain Monte Carlo
@ -44,24 +44,21 @@ $$
I = \int\limits_{\Omega} dx \, f(x) I = \int\limits_{\Omega} dx \, f(x)
\with V = \int\limits_{\Omega} dx \with V = \int\limits_{\Omega} dx
$$ $$
the simplest MC method approach is to sample $N$ points $x_i$ in $V$ and the simplest MC method approach is to sample $N$ points $x_i$ in $V$ and
approximate $I$ as: approximate $I$ as:
$$ $$
I \approx I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \avg{f} I \approx I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \avg{f}
$$ $$
If $x_i$ are uniformly distributed $I_N \rightarrow I$ for $N \rightarrow + If $x_i$ are uniformly distributed $I_N \rightarrow I$ for $N \rightarrow +
\infty$ by the law of large numbers, whereas the sample variance can be \infty$ by the law of large numbers, whereas the integral variance can be
estimated as: estimated as:
$$ $$
\sigma^2_f = \frac{1}{N - 1} \sigma^2_f = \frac{1}{N - 1}
\sum_{i = 1}^N \left( f(x_i) - \avg{f} \right)^2 \sum_{i = 1}^N \left( f(x_i) - \avg{f} \right)^2
\et \et
\sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N
\sigma^2_f = \frac{V^2}{N} \sigma^2_f \sigma^2_f = \frac{V^2}{N} \sigma^2_f
$$ $$
Thus, the error decreases as $1/\sqrt{N}$. Thus, the error decreases as $1/\sqrt{N}$.
Unlike in deterministic methods, the error estimate is not a strict bound: Unlike in deterministic methods, the error estimate is not a strict bound:
random sampling may not cover all the important features of the integrand and random sampling may not cover all the important features of the integrand and
@ -99,15 +96,15 @@ Table: Some MC results with three different numbers of function calls.
As can be seen, $\sigma$ is always of the same order of magnitude of diff, As can be seen, $\sigma$ is always of the same order of magnitude of diff,
except for very low numbers of function calls. Even with \num{5e7} calls, except for very low numbers of function calls. Even with \num{5e7} calls,
$I^{\text{oss}}$ still differs from $I$ at the fifth decimal place, meaning $I^{\text{oss}}$ still differs from $I$ at the fifth decimal place, meaning
that this method shows a really slow convergence. that this method shows a really slow convergence.
The $\sigma$ dependence on the number $C$ of function calls was checked with a The $\sigma$ dependence on the number $C$ of function calls was checked with a
least square minimization by modeling the data with the function: least square minimization by modeling the data with the function:
$$ $$
\sigma = \frac{a}{x^b} \sigma = \frac{a}{x^b}
$$ $$
The obtained result (@fig:err_fit) confirms the expected result
As can be seen in @fig:err_fit, the obtained result confirmes the expected value $b^{\text{exp}} = 0.5$, with an observed value of $\sim 0.499$.
of $b^{\text{exp}} = 0.5$, having found $b \sim 0.499$.
Given this dependence, for an error of $10^{-n}$, a number $\propto 10^{2n}$ of Given this dependence, for an error of $10^{-n}$, a number $\propto 10^{2n}$ of
function calls is needed. To compute an integral within double precision, an function calls is needed. To compute an integral within double precision, an
impossibly large number of $\sigma \sim 10^{32}$ calls is needed, which makes impossibly large number of $\sigma \sim 10^{32}$ calls is needed, which makes
@ -125,11 +122,11 @@ partitioned into subpopulations. Stratification, indeed, is the process of
dividing the primary sample into subgroups (strata) before sampling dividing the primary sample into subgroups (strata) before sampling
within each stratum. within each stratum.
Given a sample $\{x_j\}_i$ of the $i$-th strata, its mean $\bar{x}_i$ and Given a sample $\{x_j\}_i$ of the $i$-th strata, its mean $\bar{x}_i$ and
variance ${\sigma^2_x}_i$, are given by variance ${\sigma^2_x}_i$, are given by:
$$ $$
\bar{x}_i = \frac{1}{n_i} \sum_j x_j \bar{x}_i = \frac{1}{n_i} \sum_j x_j
$$ $$
and from: and
$$ $$
\sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( x_j - \bar{x}_i \right)^2 \sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( x_j - \bar{x}_i \right)^2
\thus \thus
@ -137,8 +134,8 @@ $$
$$ $$
where: where:
- $j$ runs over the points $x_j$ of the sample - $j$ runs over the points $x_j$ of the sample,
- $n_i$ is the size of the sample - $n_i$ is the size of the sample,
- $\sigma_i^2$ is the variance associated to every point of the $i$-th - $\sigma_i^2$ is the variance associated to every point of the $i$-th
stratum. stratum.
@ -169,12 +166,11 @@ population. For examples, see [@ridder17].
The MISER technique aims at reducing the integration error through the use of The MISER technique aims at reducing the integration error through the use of
recursive stratified sampling. recursive stratified sampling.
As stated before, according to the law of large numbers, for a large number of As stated before, according to the law of large numbers, for a great number of
extracted points, the estimation of the integral $I$ can be computed as: extracted points, the estimation of the integral $I$ can be computed as:
$$ $$
I= V \cdot \avg{f} I= V \cdot \avg{f}
$$ $$
Since $V$ is known (in this case, $V = 1$), it is sufficient to estimate Since $V$ is known (in this case, $V = 1$), it is sufficient to estimate
$\avg{f}$. $\avg{f}$.
@ -191,26 +187,23 @@ is given by:
$$ $$
\sigma^2 = \frac{\sigma_a^2}{4n_a} + \frac{\sigma_b^2}{4n_b} \sigma^2 = \frac{\sigma_a^2}{4n_a} + \frac{\sigma_b^2}{4n_b}
$$ $$
It can be shown that this variance is minimized by distributing the points such It can be shown that this variance is minimized by distributing the points such
that: that:
$$ $$
\frac{n_a}{n_a + n_b} = \frac{\sigma_a}{\sigma_a + \sigma_b} \frac{n_a}{n_a + n_b} = \frac{\sigma_a}{\sigma_a + \sigma_b}
$$ $$
Hence, the smallest error estimate is obtained by allocating sample points in Hence, the smallest error estimate is obtained by allocating sample points in
proportion to the standard deviation of the function in each sub-region. proportion to the standard deviation of the function in each sub-region.
The whole integral estimate and its variance are therefore given by: The whole integral estimate and its variance are therefore given by:
$$ $$
I = V \cdot \avg{f} \et \sigma_I^2 = V^2 \cdot \sigma^2 I = V \cdot \avg{f} \et \sigma_I^2 = V^2 \cdot \sigma^2
$$ $$
When implemented, MISER is in fact a recursive method. First, all the possible When implemented, MISER is in fact a recursive method. First, all the possible
bisections of $\Omega$ are tested and the one which minimizes the combined bisections of $\Omega$ are tested and the one which minimizes the combined
variance of the two sub-regions is selected. In order to speed up the variance of the two sub-regions is selected. In order to speed up the
algorithm, the variance in the sub-regions is estimated with a fraction of the algorithm, the variance in the sub-regions is estimated with a fraction of the
total number of available points (function calls), in GSL it default to 0.1. total number of available points (function calls), in GSL it is default set to
The remaining points are allocated to the sub-regions using the formula for 0.1. The remaining points are allocated to the sub-regions using the formula for
$n_a$ and $n_b$, once the variances are computed. $n_a$ and $n_b$, once the variances are computed.
This procedure is then repeated recursively for each of the two half-regions This procedure is then repeated recursively for each of the two half-regions
@ -219,15 +212,18 @@ from the best bisection. When the allocated calls for a region running out
The final individual values and their error estimates are then combined upwards The final individual values and their error estimates are then combined upwards
to give an overall result and an estimate of its error [@sayah19]. to give an overall result and an estimate of its error [@sayah19].
![Estimations $I^{\text{oss}}$ of the integral $I$ obtained ![Estimations $I^{\text{oss}}$ of the integral $I$ obtained for plain MC and
for the three implemented method for different values of MISER for different values of function calls. Errorbars showing their
function calls. Errorbars showing their estimated estimated uncertainties.](images/5-MC_MC_MI.pdf){#fig:miser-iter}
uncertainties.](images/5-MC_MC_MI.pdf){#fig:miser-iter}
Results for this particular sample are shown in black in @fig:miser-iter and Results for this particular sample are shown in black in @fig:miser-iter and
some of them are listed in @tbl:miser-res. Except for the first very little compared with the plain MC results (in red). Some of them are listed in
number of calls, the improvement with respect to the Plain MC technique (in @tbl:miser-res. Except for the first very little number of calls, the
red) is appreciable. improvement with respect to the Plain MC technique is appreciable.
The convergence is much faster than a plain MC: at 500'000 function calls, the
estimate agrees with the exact integral to the fifth decimal place. Once again,
the standard deviation and the difference share the same magnitude.
-------------------------------------------------------------------- --------------------------------------------------------------------
calls $I^{\text{oss}}$ $\sigma$ diff calls $I^{\text{oss}}$ $\sigma$ diff
@ -242,55 +238,45 @@ calls $I^{\text{oss}}$ $\sigma$ diff
Table: MISER results with different numbers of function calls. Differences Table: MISER results with different numbers of function calls. Differences
between computed and exact values are given in diff. {#tbl:miser-res} between computed and exact values are given in diff. {#tbl:miser-res}
The convergence is much faster than a plain MC: at 500'000 function calls, the
estimate agrees with the exact integral to the fifth decimal place. Once again,
the standard deviation and the difference share the same magnitude.
## Importance sampling ## Importance sampling
In Monte Carlo methods, importance sampling is a technique which samples points In Monte Carlo methods, importance sampling is a technique which samples points
from distribution whose shape is close to the integrand $f$ itself. In this way from distribution whose shape is close to the integrand $f$ itself. This way,
the points cluster in the regions that make the largest contribution to the the points cluster in the regions that make the largest contribution to the
integral $\int f(x)dx$ and consequently decrease the variance. integral $\int f(x)dx$ and consequently decrease the variance.
In a plain MC the points are sampled uniformly, so their probability In a plain MC the points are sampled uniformly, so their probability
density is given by density is given by:
$$ $$
g(x) = \frac{1}{V} \quad \forall x \in \Omega g(x) = \frac{1}{V} \quad \forall x \in \Omega
$$ $$
and the integral can be written as:
and the integral can be written as
$$ $$
I = \int_\Omega dx f(x) = V \int_\Omega f(x) \frac{1}{V}dx I = \int_\Omega dx f(x) = V \int_\Omega f(x) \frac{1}{V}dx
\approx V \avg{f} \approx V \avg{f}
$$ $$
More generally, consider a distribution $h(x)$ and similarly do:
More generally, consider a distribution $h(x)$ and similarly do
$$ $$
I I
= \int_\Omega dx f(x) = \int_\Omega dx f(x)
= \int_\Omega dx \, \frac{f(x)}{h(x)} \, h(x) = \int_\Omega dx \, \frac{f(x)}{h(x)} \, h(x)
= \Exp \left[ \frac{f}{h}, h \right] = \Exp \left[ \frac{f}{h}, h \right]
$$ $$
where $\Exp[X, h]$ is the expected value of $X$ wrt $h$. where $\Exp[X, h]$ is the expected value of $X$ wrt $h$. Also note that $h$ has
Also note that $h$ has to vanish outside $\Omega$ for this to hold. to vanish outside $\Omega$ for this to hold.
To reduce the variance, as anticipated, $h$ must be close to $f$. Assuming they
As anticipated, to reduce the variance $h$ must be close to $f$. are proportional, $h(x) = \alpha |f(x)|$, it follows that:
Assuming they are proportional, $h(x) = \alpha |f(x)|$, it follows
that:
$$ $$
\Exp \left[ \frac{f}{h}, h \right] = \frac{1}{\alpha} \Exp \left[ \frac{f}{h}, h \right] = \frac{1}{\alpha}
\et \et
\Var \left[ \frac{f}{h}, h \right] = 0 \Var \left[ \frac{f}{h}, h \right] = 0
$$ $$
For the expected value to give the original $I$, the proportionality constant For the expected value to give the original $I$, the proportionality constant
must be taken to be $I^{-1}$, meaning: must be taken to be $I^{-1}$, meaning:
$$ $$
h(z) = \frac{1}{I}\, |f(z)| h(z) = \frac{1}{I}\, |f(z)|
$$ $$
The sampling from this $h$ would produce a perfect result with zero variance. The sampling from this $h$ would produce a perfect result with zero variance.
Of course, this is nonsense: if $I$ is known in advance, there would be no need Of course, this is nonsense: if $I$ is known in advance, there would be no need
to do a Monte Carlo integration to begin with. Nonetheless, this example serves to do a Monte Carlo integration to begin with. Nonetheless, this example serves
@ -316,7 +302,7 @@ different subregions with an iterative method, namely:
- the volume $V$ is divided into $N$ intervals of width $\Delta x_i = - the volume $V$ is divided into $N$ intervals of width $\Delta x_i =
\Delta x \, \forall \, i$, where $N$ is limited by the computer storage \Delta x \, \forall \, i$, where $N$ is limited by the computer storage
space available and must be held constant from iteration to iteration. space available and must be held constant from iteration to iteration.
(In GSL this default to $N = 50$); (In GSL this defaults to $N = 50$);
- each interval is then divided into $m_i + 1$ subintervals, where: - each interval is then divided into $m_i + 1$ subintervals, where:
$$ $$
@ -326,8 +312,8 @@ different subregions with an iterative method, namely:
where $j$ runs over all the intervals and $\bar{f}_i$ is the average value where $j$ runs over all the intervals and $\bar{f}_i$ is the average value
of $f$ in the interval. Hence, $m_i$ is therefore a measure of the of $f$ in the interval. Hence, $m_i$ is therefore a measure of the
"importance" of the interval with respect to the others: the higher "importance" of the interval with respect to the others: the higher
$\bar{f}_i$, the higher $m_i$. The constant $K$ is called stiffness. $\bar{f}_i$, the higher $m_i$. The constant $K$, called stiffness, defaults
It is defaults 1.5 in GSL; to 1.5 in GSL;
- as it is desirable to restore the number of intervals to its original value - as it is desirable to restore the number of intervals to its original value
$N$, groups of the new intervals must be merged into larger intervals, the $N$, groups of the new intervals must be merged into larger intervals, the
@ -337,10 +323,10 @@ different subregions with an iterative method, namely:
- the function is integrated with a plain MC method in each interval - the function is integrated with a plain MC method in each interval
and the sum of the integrals is taken as the $j$-th estimate of $I$. Its and the sum of the integrals is taken as the $j$-th estimate of $I$. Its
error is given the sum of the variances in each interval. error is given by the sum of the variances in each interval;
- the new grid is used and further refined in subsequent iterations. - the new grid is further refined in subsequent iterations. By default, the
By default, the number of iterations 5 in GSL. number of iterations is 5 in GSL.
The final estimate of the integral $I$ and its error The final estimate of the integral $I$ and its error
$\sigma_I$ are made based on weighted average: $\sigma_I$ are made based on weighted average:
@ -359,27 +345,13 @@ $$
\chi^2_i = \sum_{j \le i} \chi^2_i = \sum_{j \le i}
\frac{(I_j - \avg{I})^2}{\sigma_j^2} \frac{(I_j - \avg{I})^2}{\sigma_j^2}
$$ $$
While performing the iterations, if the value of $\chi_r^2$ exceeds 1.5, the
While performing the iterations, if the value of $\chi_r^2$ exceed 1.5, the routine stops since it is not making progress.
routine stops since is not making progress.
Clearly, a better estimation is achieved with a greater number of function Clearly, a better estimation is achieved with a greater number of function
calls. For this particular sample, the most accurate results are shown in calls. For this particular sample, the most accurate results are shown in
@fig:vegas-iter and some of them are listed in @tbl:vegas-res. @fig:vegas-iter and some of them are listed in @tbl:vegas-res.
----------------------------------------------------------------------------------------------
calls $I^{\text{oss}}$ $\sigma$ diff $\chi_r^2$
------------------ ------------------ ------------------ ------------------ ------------------
\num{5e5} 1.7182818281 0.0000000012 0.0000000004 1.457
\num{5e6} 1.7182818284 0.0000000000 0.0000000001 0.632
\num{5e7} 1.7182818285 0.0000000000 0.0000000000 0.884
----------------------------------------------------------------------------------------------
Table: Some VEGAS results with different numbers of
function calls. {#tbl:vegas-res}
![Only the most accurate results are shown in order to stress the ![Only the most accurate results are shown in order to stress the
differences between VEGAS (in gray) and MISER (in black) methods differences between VEGAS (in gray) and MISER (in black) methods
results.](images/5-MC_MI_VE.pdf){#fig:vegas-iter} results.](images/5-MC_MI_VE.pdf){#fig:vegas-iter}
@ -391,6 +363,19 @@ also confirmed by the very small difference shown in @tbl:vegas-res.
In fact, with a number of \num{5e7} function calls, the difference is In fact, with a number of \num{5e7} function calls, the difference is
smaller than \num{1e-10}. smaller than \num{1e-10}.
----------------------------------------------------------------------------------------------
calls $I^{\text{oss}}$ $\sigma$ diff $\chi_r^2$
------------------ ------------------ ------------------ ------------------ ------------------
\num{5e5} 1.7182818281 0.0000000012 0.0000000004 1.457
\num{5e6} 1.7182818284 0.0000000000 0.0000000001 0.632
\num{5e7} 1.7182818285 0.0000000000 0.0000000000 0.884
----------------------------------------------------------------------------------------------
Table: Best VEGAS results with different numbers of
function calls. {#tbl:vegas-res}
In conclusion, between a plain Monte Carlo technique, stratified sampling and In conclusion, between a plain Monte Carlo technique, stratified sampling and
importance sampling, the last turned out to be the most powerful mean to importance sampling, the last turned out to be the most powerful mean to
obtain a good estimation of the integrand. obtain a good estimation of the integrand.