ex-5: review

This commit is contained in:
Michele Guerini Rocco 2020-05-31 17:17:00 +02:00
parent 686d2d4554
commit d0d03d521e

View File

@ -31,9 +31,9 @@ techniques.
whose exact value is 1.7182818285... whose exact value is 1.7182818285...
The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser
and Vegas. Besides this popularity fact, these three method were chosen for and Vegas. Besides being commonly used, these were chosen for also being
being implemented in the GSL libraries `gsl_monte_plain`, `gsl_monte_miser` and implemented in the GSL libraries `gsl_monte_plain`, `gsl_monte_miser` and
`gsl_monte_vegas` respectively. `gsl_monte_vegas`, respectively.
## Plain Monte Carlo ## Plain Monte Carlo
@ -45,39 +45,42 @@ $$
\with V = \int\limits_{\Omega} dx \with V = \int\limits_{\Omega} dx
$$ $$
the simplest MC method approach is to sample $N$ points $x_i$ evenly distributed the simplest MC method approach is to sample $N$ points $x_i$ in $V$ and
in $V$ and approx $I$ as: approximate $I$ as:
$$ $$
I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle I \approx I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \avg{f}
$$ $$
with $I_N \rightarrow I$ for $N \rightarrow + \infty$ because of the law of If $x_i$ are uniformly distributed $I_N \rightarrow I$ for $N \rightarrow +
large numbers, whereas the sample variance can be estimated as: \infty$ by the law of large numbers, whereas the sample variance can be
estimated as:
$$ $$
\sigma^2_f = \frac{1}{N - 1} \sum_{i = 1}^N \left( f(x_i) - \langle f \sigma^2_f = \frac{1}{N - 1}
\rangle \right)^2 \et \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N \sum_{i = 1}^N \left( f(x_i) - \avg{f} \right)^2
\sigma^2_f = \frac{V^2}{N} \sigma^2_f \et
\sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N
\sigma^2_f = \frac{V^2}{N} \sigma^2_f
$$ $$
Thus, the error decreases as $1/\sqrt{N}$. Thus, the error decreases as $1/\sqrt{N}$.
Unlike in deterministic methods, the estimate of the error is not a strict error Unlike in deterministic methods, the error estimate is not a strict bound:
bound: random sampling may not uncover all the important features of the random sampling may not cover all the important features of the integrand and
integrand and this can result in an underestimate of the error. this can result in an underestimation of the error.
In this case $f(x) = e^{x}$ and $\Omega = [0,1]$, hence $V = 1$. In this case $f(x) = e^{x}$ and $\Omega = [0,1]$, hence $V = 1$.
Since the proximity of $I_N$ to $I$ is related to $N$, the accuracy of the Since the distance from $I$ of $I_N$ is related to $N$, the accuracy of the
method lies in how many points are generated, namely how many function calls method lies in how many points are generated, namely how many function calls
are executed when the iterative method is implemented. In @fig:MC and are executed when the iterative method is implemented. In @fig:plain-mc-iter
@fig:MI, results obtained with the plain MC method are shown in red. In and @fig:miser-iter, results obtained with the plain MC method are shown in
@tbl:MC, some of them are listed: the estimated integrals $I^{\text{oss}}$ are red. In @tbl:plain-mc-res, some of them are listed: the estimated integrals
compared to the expected value $I$ and the differences diff between them are $I^{\text{oss}}$ are compared to the expected value $I$ and the differences
given. between them are given.
![Estimated values of $I$ obatined by Plain MC technique with different ![Estimated values of $I$ obatined by Plain MC technique with different
number of function calls; logarithmic scale; errorbars showing their number of function calls; logarithmic scale; errorbars showing their
estimated uncertainties. As can be seen, the process does a sort o seesaw estimated uncertainties. As can be seen, the process does a sort o seesaw
around the correct value.](images/5-MC_MC.pdf){#fig:MC} around the correct value.](images/5-MC_MC.pdf){#fig:plain-mc-iter}
--------------------------------------------------------------------------- ---------------------------------------------------------------------------
calls $I^{\text{oss}}$ $\sigma$ diff calls $I^{\text{oss}}$ $\sigma$ diff
@ -91,12 +94,12 @@ calls $I^{\text{oss}}$ $\sigma$ diff
Table: Some MC results with three different numbers of function calls. Table: Some MC results with three different numbers of function calls.
Differences between computed and exact values are given in Differences between computed and exact values are given in
diff. {#tbl:MC} diff. {#tbl:plain-mc-res}
As can be seen, $\sigma$ is always of the same order of magnitude of diff, As can be seen, $\sigma$ is always of the same order of magnitude of diff,
except for very low numbers of function calls. Even with \num{5e7} calls, except for very low numbers of function calls. Even with \num{5e7} calls,
$I^{\text{oss}}$ still differs from $I$ at the fifth decimal digit, meaning that $I^{\text{oss}}$ still differs from $I$ at the fifth decimal place, meaning
this method shows a really slow convergence. In fact, since the $\sigma$s that this method shows a really slow convergence. In fact, since the $\sigma$
dependence on the number $C$ of function calls is confirmed: dependence on the number $C$ of function calls is confirmed:
$$ $$
\begin{cases} \begin{cases}
@ -110,46 +113,48 @@ $$
\frac{\sigma_1}{\sigma_2} = 9.9965508 \sim 10 = \sqrt{\frac{C_2}{C_1}} \frac{\sigma_1}{\sigma_2} = 9.9965508 \sim 10 = \sqrt{\frac{C_2}{C_1}}
$$ $$
if an error of $\sim 1^{-n}$ is required, a number $\propto 10^{2n}$ of For an error of $10^{-n}$, a number $\propto 10^{2n}$ of function calls is
function calls should be executed, meaning that for $\sigma \sim 1^{-10} needed. To compute an integral within double precision,
\rightarrow C = \num{1e20}$, which would be impractical. an impossibly large number of $\sigma \sim 10^{32}$ calls is needed,
which makes this method unpractical for high-precision applications.
## Stratified sampling ## Stratified sampling
In statistics, stratified sampling is a method of sampling from a population In statistics, stratified sampling is a method of sampling from a population
partitioned into subpopulations. Stratification, indeed, is the process of partitioned into subpopulations. Stratification, indeed, is the process of
dividing the primary sample into subgroups (strata) before sampling random dividing the primary sample into subgroups (strata) before sampling
within each stratum. within each stratum.
Given the mean $\bar{x}_i$ and variance ${\sigma^2_x}_i$ of an entity $x$ Given a sample $\{x_j\}_i$ of the $i$-th strata, its mean $\bar{x}_i$ and
sorted with simple random sampling in the $i^{\text{th}}$ strata, such as: variance ${\sigma^2_x}_i$, are given by
$$ $$
\bar{x}_i = \frac{1}{n_i} \sum_j x_j \bar{x}_i = \frac{1}{n_i} \sum_j x_j
$$ $$
and from:
and:
$$ $$
\sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( x_j - \bar{x}_i \right)^2 \sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( x_j - \bar{x}_i \right)^2
\thus \thus
{\sigma^2_x}_i = \frac{1}{n_i^2} \sum_j \sigma_i^2 = \frac{\sigma_i^2}{n_i} {\sigma^2_x}_i = \frac{1}{n_i^2} \sum_j \sigma_i^2 = \frac{\sigma_i^2}{n_i}
$$ $$
where: where:
- $j$ runs over the points $x_j$ sampled in the $i^{\text{th}}$ stratum, - $j$ runs over the points $x_j$ of the sample
- $n_i$ is the number of points sorted in it, - $n_i$ is the size of the sample
- $\sigma_i^2$ is the variance associated with the $j^{\text{th}}$ point. - $\sigma_i^2$ is the variance associated to every point of the $i$-th
stratum.
then, the mean $\bar{x}$ and variance $\sigma_x^2$ estimated with stratified An estimation of the mean $\bar{x}$ and variance $\sigma_x^2$ for the whole
sampling for the whole population are: population are then given by the stratified sampling as follows:
$$ $$
\bar{x} = \frac{1}{N} \sum_i N_i \bar{x}_i \et \bar{x} = \frac{1}{N} \sum_i N_i \bar{x}_i \et
\sigma_x^2 = \sum_i \left( \frac{N_i}{N} \right)^2 {\sigma_x}^2_i \sigma_x^2 = \sum_i \left( \frac{N_i}{N} \right)^2 {\sigma_x}^2_i
= \sum_i \left( \frac{N_i}{N} \right)^2 \frac{\sigma^2_i}{n_i} = \sum_i \left( \frac{N_i}{N} \right)^2 \frac{\sigma^2_i}{n_i}
$$ $$
where:
where $i$ runs over the strata, $N_i$ is the weight of the $i^{\text{th}}$ - $i$ runs over the strata,
stratum and $N$ is the sum of all strata weights. - $N_i$ is the weight of the $i$-th stratum
- $N$ is the sum of all strata weights.
In practical terms, it can produce a weighted mean that has less variability In practical terms, it can produce a weighted mean that has less variability
than the arithmetic mean of a simple random sample of the whole population. In than the arithmetic mean of a simple random sample of the whole population. In
@ -158,34 +163,31 @@ result will have a smaller error in estimation with respect to the one otherwise
obtained with simple sampling. obtained with simple sampling.
For this reason, stratified sampling is used as a method of variance reduction For this reason, stratified sampling is used as a method of variance reduction
when MC methods are used to estimate population statistics from a known when MC methods are used to estimate population statistics from a known
population [@ridder17]. population. For examples, see [@ridder17].
### MISER ### MISER
The MISER technique aims to reduce the integration error through the use of The MISER technique aims at reducing the integration error through the use of
recursive stratified sampling. recursive stratified sampling.
As stated before, according to the law of large numbers, for a large number of As stated before, according to the law of large numbers, for a large number of
extracted points, the estimation of the integral $I$ can be computed as: extracted points, the estimation of the integral $I$ can be computed as:
$$ $$
I= V \cdot \langle f \rangle I= V \cdot \avg{f}
$$ $$
Since $V$ is known (in this case, $V = 1$), it is sufficient to estimate Since $V$ is known (in this case, $V = 1$), it is sufficient to estimate
$\langle f \rangle$. $\avg{f}$.
Consider two disjoint regions $a$ and $b$, such that $a \cup b = \Omega$, in Consider two disjoint regions $a$ and $b$, such that $a \cup b = \Omega$, in
which $n_a$ and $n_b$ points are respectively uniformly sampled. Given the which $n_a$ and $n_b$ points are respectively uniformly sampled. Given the
Monte Carlo estimates of the means $\langle f \rangle_a$ and $\langle f Monte Carlo estimates of the means $\avg{f}_a$ and $\avg{f}_b$ of those points
\rangle_b$ of those points and their variances $\sigma_a^2$ and $\sigma_b^2$, if and their variances $\sigma_a^2$ and $\sigma_b^2$, if the weights $N_a$ and
the weights $N_a$ and $N_b$ of $\langle f \rangle_a$ and $\langle f \rangle_b$ $N_b$ of $\avg{f}_a$ and $\avg{f}_b$ are chosen unitary, then the variance
are chosen unitary, then the variance $\sigma^2$ of the combined estimate $\sigma^2$ of the combined estimate $\avg{f}$:
$\langle f \rangle$:
$$ $$
\langle f \rangle = \frac{1}{2} \left( \langle f \rangle_a \avg{f} = \frac{1}{2} \left( \avg{f}_a + \avg{f}_b \right)
+ \langle f \rangle_b \right)
$$ $$
is given by: is given by:
$$ $$
\sigma^2 = \frac{\sigma_a^2}{4n_a} + \frac{\sigma_b^2}{4n_b} \sigma^2 = \frac{\sigma_a^2}{4n_a} + \frac{\sigma_b^2}{4n_b}
@ -201,182 +203,170 @@ Hence, the smallest error estimate is obtained by allocating sample points in
proportion to the standard deviation of the function in each sub-region. proportion to the standard deviation of the function in each sub-region.
The whole integral estimate and its variance are therefore given by: The whole integral estimate and its variance are therefore given by:
$$ $$
I = V \cdot \langle f \rangle \et \sigma_I^2 = V^2 \cdot \sigma^2 I = V \cdot \avg{f} \et \sigma_I^2 = V^2 \cdot \sigma^2
$$ $$
When implemented, MISER is in fact a recursive method. First, all the possible When implemented, MISER is in fact a recursive method. First, all the possible
bisections of $\Omega$ are tested and the one which minimizes the combined bisections of $\Omega$ are tested and the one which minimizes the combined
variance of the two sub-regions is selected. In order to speed up the variance of the two sub-regions is selected. In order to speed up the
algorithm, the variance in the sub-regions is estimated with a fraction of the algorithm, the variance in the sub-regions is estimated with a fraction of the
total number of available points (function calls) which is default set to 0.1. total number of available points (function calls), in GSL it default to 0.1.
The remaining points are allocated to the sub-regions using the formula for The remaining points are allocated to the sub-regions using the formula for
$n_a$ and $n_b$, once the variances are computed. $n_a$ and $n_b$, once the variances are computed.
The same procedure is then repeated recursively for each of the two half-spaces
from the best bisection. When less than 32*16 (default option) function calls This procedure is then repeated recursively for each of the two half-regions
are available in a single section, the integral and the error in this section from the best bisection. When the allocated calls for a region running out
are estimated using the plain Monte Carlo algorithm. (less than 512 in GSL), the method falls back to a plain Monte Carlo.
The final individual values and their error estimates are then combined upwards The final individual values and their error estimates are then combined upwards
to give an overall result and an estimate of its error [@sayah19]. to give an overall result and an estimate of its error [@sayah19].
![Estimations $I^{\text{oss}}$ of the integral $I$ obtained for the three ![Estimations $I^{\text{oss}}$ of the integral $I$ obtained
implemented method for different values of function calls. Errorbars for the three implemented method for different values of
showing their estimated uncertainties.](images/5-MC_MC_MI.pdf){#fig:MI} function calls. Errorbars showing their estimated
uncertainties.](images/5-MC_MC_MI.pdf){#fig:miser-iter}
Results for this particular sample are shown in black in @fig:MI and some of Results for this particular sample are shown in black in @fig:miser-iter and
them are listed in @tbl:MI. Except for the first very little number of calls, some of them are listed in @tbl:miser-res. Except for the first very little
the improvement with respect to the Plain MC technique (in red) is appreciable. number of calls, the improvement with respect to the Plain MC technique (in
red) is appreciable.
--------------------------------------------------------------------------- --------------------------------------------------------------------
calls $I^{\text{oss}}$ $\sigma$ diff calls $I^{\text{oss}}$ $\sigma$ diff
------------------ ------------------ ------------------ ------------------ ----------- ------------------ ------------------ ------------------
\num{5e5} 1.7182850738 0.0000021829 0.0000032453 \num{5e5} 1.7182850738 0.0000021829 0.0000032453
\num{5e6} 1.7182819143 0.0000001024 0.0000000858 \num{5e6} 1.7182819143 0.0000001024 0.0000000858
\num{5e7} 1.7182818221 0.0000000049 0.0000000064 \num{5e7} 1.7182818221 0.0000000049 0.0000000064
--------------------------------------------------------------------------- --------------------------------------------------------------------
Table: MISER results with different numbers of function calls. Differences Table: MISER results with different numbers of function calls. Differences
between computed and exact values are given in diff. {#tbl:MI} between computed and exact values are given in diff. {#tbl:miser-res}
This time the convergence is much faster: already with 500'000 number of points, The convergence is much faster than a plain MC: at 500'000 function calls, the
the correct results differs from the computed one at the fifth decimal digit. estimate agrees with the exact integral to the fifth decimal place. Once again,
Once again, the error lies always in the same order of magnitude of diff. the standard deviation and the difference share the same magnitude.
## Importance sampling ## Importance sampling
In statistics, importance sampling is a method which samples points from the In Monte Carlo methods, importance sampling is a technique which samples points
probability distribution $f$ itself, so that the points cluster in the regions from distribution whose shape is close to the integrand $f$ itself. In this way
that make the largest contribution to the integral. the points cluster in the regions that make the largest contribution to the
integral $\int f(x)dx$ and consequently decrease the variance.
Remind that $I = V \cdot \langle f \rangle$ and therefore only $\langle f In a plain MC the points are sampled uniformly, so their probability
\rangle$ is to be estimated. Consider a sample of $n$ points {$x_i$} generated density is given by
according to a probability distribution function $P$ which gives thereby the
following expected value:
$$ $$
E [x, P] = \frac{1}{n} \sum_i x_i g(x) = \frac{1}{V} \quad \forall x \in \Omega
$$ $$
with variance: and the integral can be written as
$$ $$
\sigma^2 [E, P] = \frac{\sigma^2 [x, P]}{n} I = \int_\Omega dx f(x) = V \int_\Omega f(x) \frac{1}{V}dx
\with \sigma^2 [x, P] = \frac{1}{n -1} \sum_i \left( x_i - E [x, P] \right)^2 \approx V \avg{f}
$$ $$
where $i$ runs over the sample. More generally, consider a distribution $h(x)$ and similarly do
In the case of plain MC, $\langle f \rangle$ is estimated as the expected
value of points {$f(x_i)$} sorted with $P (x_i) = 1 \quad \forall i$, since they
are evenly distributed in $\Omega$. The idea is to sample points from a
different distribution to lower the variance of $E[x, P]$, which results in
lowering $\sigma^2 [x, P]$. This is accomplished by choosing a random variable
$y$ and defining a new probability $P^{(y)}$ in order to satisfy:
$$ $$
E [x, P] = E \left[ \frac{x}{y}, P^{(y)} \right] I
= \int_\Omega dx f(x)
= \int_\Omega dx \, \frac{f(x)}{h(x)} \, h(x)
= \Exp \left[ \frac{f}{h}, h \right]
$$ $$
where $\Exp[X, h]$ is the expected value of $X$ wrt $h$.
Also note that $h$ has to vanish outside $\Omega$ for this to hold.
which is to say: As anticipated, to reduce the variance $h$ must be close to $f$.
Assuming they are proportional, $h(x) = \alpha |f(x)|$, it follows
that:
$$ $$
I = \int \limits_{\Omega} dx f(x) = \Exp \left[ \frac{f}{h}, h \right] = \frac{1}{\alpha}
\int \limits_{\Omega} dx \, \frac{f(x)}{g(x)} \, g(x)=
\int \limits_{\Omega} dx \, w(x) \, g(x)
$$
where $E \, \longleftrightarrow \, I$ and:
$$
\begin{cases}
f(x) \, \longleftrightarrow \, x \\
1 \, \longleftrightarrow \, P
\end{cases}
\et \et
\begin{cases} \Var \left[ \frac{f}{h}, h \right] = 0
g(x) \, \longleftrightarrow \, y = P^{(y)} \\
w(x) = \frac{f(x)}{g(x)} \, \longleftrightarrow \, \frac{x}{y}
\end{cases}
$$ $$
Where the symbol $\longleftrightarrow$ points out the connection between the For the expected value to give the original $I$, the proportionality constant
variables. The best variable $y$ would be: must be taken to be $I^{-1}$, meaning:
$$ $$
y = \frac{x}{E [x, P]} \, \longleftrightarrow \, \frac{f(x)}{I} h(z) = \frac{1}{I}\, |f(z)|
\thus \frac{x}{y} = E [x, P]
$$ $$
and even a single sample under $P^{(y)}$ would be sufficient to give its value: The sampling from this $h$ would produce a perfect result with zero variance.
it can be shown that under this probability distribution the variance vanishes. Of course, this is nonsense: if $I$ is known in advance, there would be no need
Obviously, it is not possible to take exactly this choice, since $E [x, P]$ is to do a Monte Carlo integration to begin with. Nonetheless, this example serves
not given a priori. However, this gives an insight into what importance sampling to prove how variance reduction is achieved by sampling from an approximation
does. In fact, given that: of the integrand.
$$
E [x, P] = \int \limits_{a = - \infty}^{a = + \infty}
da \, a P(x \in [a, a + da])
$$
the best probability $P^{(y)}$ redistributes the law of $x$ so that its samples In conclusion, since certain values of $x$ have more impact on $\Exp[f/h, h]$
frequencies are sorted directly according to their weights in $E[x, P]$, namely: than others, these "important" values must be emphasized by sampling them more
$$
P^{(y}(x \in [a, a + da]) = \frac{1}{E [x, P]} a P (x \in [a, a + da])
$$
In conclusion, since certain values of $x$ have more impact on $E [x, P]$ than
others, these "important" values must be emphasized by sampling them more
frequently. As a consequence, the estimator variance will be reduced. frequently. As a consequence, the estimator variance will be reduced.
### VEGAS ### VEGAS
The VEGAS algorithm [@lepage78] of G. P. Lepage is based on importance
sampling. As stated before, it is in practice impossible to sample points from
the best distribution $h(x)$: only a good approximation can be achieved. The
VEGAS algorithm attempts this by building a histogram of the function $f$ in
different subregions with an iterative method, namely:
The VEGAS algorithm of Lepage is based on importance sampling. It aims to - a fixed number of points (function calls) is generated uniformly in the
reduce the integration error by concentrating points in the regions that make whole region;
the largest contribution to the integral.
As stated before, in practice it is impossible to sample points from the best - the volume $V$ is divided into $N$ intervals of width $\Delta x_i =
distribution $P^{(y)}$: only a good approximation can be achieved. In GSL, the
VEGAS algorithm approximates the distribution by histogramming the function $f$
in different subregions with a iterative method [@lepage78], namely:
- a fixed number of points (function calls) is evenly generated in the whole
region;
- the volume $V$ is divided into $N$ intervals with width $\Delta x_i =
\Delta x \, \forall \, i$, where $N$ is limited by the computer storage \Delta x \, \forall \, i$, where $N$ is limited by the computer storage
space available and must be held constant from iteration to iteration. space available and must be held constant from iteration to iteration.
Default $N = 50$; (In GSL this default to $N = 50$);
- each interval is then divided into $m_i + 1$ subintervals, where: - each interval is then divided into $m_i + 1$ subintervals, where:
$$ $$
m_i = K \frac{\bar{f}_i \Delta x_i}{\sum_j \bar{f}_j \Delta x_j} m_i = K \frac{\bar{f}_i \Delta x_i}{\sum_j \bar{f}_j \Delta x_j}
$$ $$
where $j$ runs over all the intervals and $\bar{f}_i$ is the average value where $j$ runs over all the intervals and $\bar{f}_i$ is the average value
of $f$ in the interval. Hence, $m_i$ is therefore a measure of the of $f$ in the interval. Hence, $m_i$ is therefore a measure of the
"importance" of the interval with respect to the others: the higher "importance" of the interval with respect to the others: the higher
$\bar{f}_i$, the higher $m_i$. The stiffness $K$ is default set to 1.5; $\bar{f}_i$, the higher $m_i$. The constant $K$ is called stiffness.
It is defaults 1.5 in GSL;
- as it is desirable to restore the number of intervals to its original value - as it is desirable to restore the number of intervals to its original value
$N$, groups of the new intervals must be amalgamated into larger intervals, $N$, groups of the new intervals must be merged into larger intervals, the
the number of subintervals in each group being constant. The net effect is number of subintervals in each group being constant. The net effect is
to alter the intervals sizes, while keeping the total number constant, so to alter the intervals sizes, while keeping the total number constant, so
that the smallest intervals occur where $f$ is largest; that the smallest intervals occur where $f$ is largest;
- the new grid is used and further refined in subsequent iterations until the
optimal grid has been obtained. By default, the number of iterations is set
to 5.
At the end, a cumulative estimate of the integral $I$ and its error $\sigma_I$ - the function is integrated with a plain MC method in each interval
are made based on weighted average: and the sum of the integrals is taken as the $j$-th estimate of $I$. Its
error is given the sum of the variances in each interval.
- the new grid is used and further refined in subsequent iterations.
By default, the number of iterations 5 in GSL.
The final estimate of the integral $I$ and its error
$\sigma_I$ are made based on weighted average:
$$ $$
I = \sigma_I^2 \sum_i \frac{I_i}{\sigma_i^2} \avg{I} = \sigma_I^2 \sum_i \frac{I_i}{\sigma_i^2}
\with \with
\sigma_I^2 = \left( \sum_i \frac{1}{\sigma_i^2} \right)^{-1} \frac{1}{\sigma_I^2} = \sum_i \frac{1}{\sigma_i^2}
$$
where $I_i$ and $\sigma_i$ are are the integral and standard deviation
estimated in each iteration.
The reliability of the result is asserted by a chi-squared per degree of
freedom $\chi_r^2$, which should be close to 1 for a good estimation. At a
given iteration $i$, the $\chi^2_i$ is computed as follows:
$$
\chi^2_i = \sum_{j \le i}
\frac{(I_j - \avg{I})^2}{\sigma_j^2}
$$ $$
where $I_i$ and $\sigma_i$ are are the integral and standard deviation While performing the iterations, if the value of $\chi_r^2$ exceed 1.5, the
estimated in each interval $\Delta x_i$ of the last iteration using the plain routine stops since is not making progress.
MC technique.
For the results to be reliable, the chi-squared per degree of freedom
$\chi_r^2$ must be consistent with 1. While performing the eiterations, if
the $\chi_r^2$ value exceed 1.5, the cycle breaks.
Clearly, once again a better estimation is achieved with a greater number of Clearly, a better estimation is achieved with a greater number of function
function calls. For this particular sample, the most accurate results are shown calls. For this particular sample, the most accurate results are shown in
in @fig:MI_VE and some of them are listed in @tbl:VEGAS. @fig:vegas-iter and some of them are listed in @tbl:vegas-res.
---------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------
calls $I^{\text{oss}}$ $\sigma$ diff $\chi_r^2$ calls $I^{\text{oss}}$ $\sigma$ diff $\chi_r^2$
@ -389,19 +379,19 @@ calls $I^{\text{oss}}$ $\sigma$ diff
---------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------
Table: Some VEGAS results with different numbers of Table: Some VEGAS results with different numbers of
function calls. {#tbl:VEGAS} function calls. {#tbl:vegas-res}
As can be appreciated in @fig:MI_VE, the VEGAS algorithm manages to compute As can be appreciated in @fig:vegas-iter, the VEGAS algorithm manages to compute the
the integral value in a most accurate way with respect to MISER. The $\chi_r^2$ integral value more accurately compared to MISER. The $\chi_r^2$ turns out to
turns out to be close enough to 1 to guarantee a good estimation of $I$, be close enough to 1 to guarantee a good estimation of $I$, goodness which is
goodness which is also confirmed by the very small difference between estimation also confirmed by the very small difference shown in @tbl:vegas-res.
and exact value, as shown in @tbl:VEGAS: with a number of \num{5e7} of function In fact, with a number of \num{5e7} function calls, the difference is
calls, the difference is smaller than \num{1e-10}. smaller than \num{1e-10}.
![Only the most accurate results are shown in order to stress the ![Only the most accurate results are shown in order to stress the
differences between VEGAS (in gray) and MISER (in black) methods differences between VEGAS (in gray) and MISER (in black) methods
results.](images/5-MC_MI_VE.pdf){#fig:MI_VE} results.](images/5-MC_MI_VE.pdf){#fig:vegas-iter}
In conclusion, between a plain Monte Carlo technique, stratified sampling and In conclusion, between a plain Monte Carlo technique, stratified sampling and
importance sampling, the last one turned out to be the most powerful mean to importance sampling, the last turned out to be the most powerful mean to
obtain a good estimation of the integrand. obtain a good estimation of the integrand.