ex-5: review

2020-05-31 17:17:00 +02:00 · 2020-05-31 17:17:00 +02:00 · d0d03d521e
commit d0d03d521e
parent 686d2d4554
1 changed files with 164 additions and 174 deletions
--- a/notes/sections/5.md
+++ b/notes/sections/5.md
@ -31,9 +31,9 @@ techniques.
 whose exact value is 1.7182818285...

 The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser
-and Vegas. Besides this popularity fact, these three method were chosen for
-being implemented in the GSL libraries `gsl_monte_plain`, `gsl_monte_miser` and
-`gsl_monte_vegas` respectively.
+and Vegas. Besides being commonly used, these were chosen for also being
+implemented in the GSL libraries `gsl_monte_plain`, `gsl_monte_miser` and
+`gsl_monte_vegas`, respectively.


 ## Plain Monte Carlo
@ -45,39 +45,42 @@ $$
  \with V = \int\limits_{\Omega} dx
 $$

-the simplest MC method approach is to sample $N$ points $x_i$ evenly distributed
-in $V$ and approx $I$ as:
+the simplest MC method approach is to sample $N$ points $x_i$ in $V$ and
+approximate $I$ as:
 $$
-  I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle
+  I \approx I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \avg{f}
 $$

-with $I_N \rightarrow I$ for $N \rightarrow + \infty$ because of the law of
-large numbers, whereas the sample variance can be estimated as:
+If $x_i$ are uniformly distributed $I_N \rightarrow I$ for $N \rightarrow +
+\infty$ by the law of large numbers, whereas the sample variance can be
+estimated as:
 $$
-  \sigma^2_f = \frac{1}{N - 1} \sum_{i = 1}^N \left( f(x_i) - \langle f
-  \rangle \right)^2 \et \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N
-  \sigma^2_f = \frac{V^2}{N} \sigma^2_f
+  \sigma^2_f = \frac{1}{N - 1}
+    \sum_{i = 1}^N \left( f(x_i) - \avg{f} \right)^2
+  \et 
+  \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N
+    \sigma^2_f = \frac{V^2}{N} \sigma^2_f
 $$

 Thus, the error decreases as $1/\sqrt{N}$.  
-Unlike in deterministic methods, the estimate of the error is not a strict error
-bound: random sampling may not uncover all the important features of the
-integrand and this can result in an underestimate of the error.
+Unlike in deterministic methods, the error estimate is not a strict bound:
+random sampling may not cover all the important features of the integrand and
+this can result in an underestimation of the error.

 In this case $f(x) = e^{x}$ and $\Omega = [0,1]$, hence $V = 1$.

-Since the proximity of $I_N$ to $I$ is related to $N$, the accuracy of the
+Since the distance from $I$ of $I_N$ is related to $N$, the accuracy of the
 method lies in how many points are generated, namely how many function calls
-are executed when the iterative method is implemented. In @fig:MC and
-@fig:MI, results obtained with the plain MC method are shown in red. In
-@tbl:MC, some of them are listed: the estimated integrals $I^{\text{oss}}$ are
-compared to the expected value $I$ and the differences diff between them are
-given.  
+are executed when the iterative method is implemented. In @fig:plain-mc-iter
+and @fig:miser-iter, results obtained with the plain MC method are shown in
+red. In @tbl:plain-mc-res, some of them are listed: the estimated integrals
+$I^{\text{oss}}$ are compared to the expected value $I$ and the differences
+between them are given.

 ![Estimated values of $I$ obatined by Plain MC technique with different
 number of function calls; logarithmic scale; errorbars showing their
 estimated uncertainties. As can be seen, the process does a sort o seesaw
-around the correct value.](images/5-MC_MC.pdf){#fig:MC}
+around the correct value.](images/5-MC_MC.pdf){#fig:plain-mc-iter}

 ---------------------------------------------------------------------------
 calls               $I^{\text{oss}}$        $\sigma$            diff
@ -91,12 +94,12 @@ calls               $I^{\text{oss}}$        $\sigma$            diff

 Table: Some MC results with three different numbers of function calls.
       Differences between computed and exact values are given in
-       diff. {#tbl:MC}
+       diff. {#tbl:plain-mc-res}

 As can be seen, $\sigma$ is always of the same order of magnitude of diff,
 except for very low numbers of function calls. Even with \num{5e7} calls,
-$I^{\text{oss}}$ still differs from $I$ at the fifth decimal digit, meaning that
-this method shows a really slow convergence. In fact, since the $\sigma$s
+$I^{\text{oss}}$ still differs from $I$ at the fifth decimal place, meaning
+that this method shows a really slow convergence. In fact, since the $\sigma$
 dependence on the number $C$ of function calls is confirmed:
 $$
  \begin{cases}
@ -110,46 +113,48 @@ $$
  \frac{\sigma_1}{\sigma_2} = 9.9965508 \sim 10 = \sqrt{\frac{C_2}{C_1}}
 $$

-if an error of $\sim 1^{-n}$ is required, a number $\propto 10^{2n}$ of
-function calls should be executed, meaning that for $\sigma \sim 1^{-10}
-\rightarrow C = \num{1e20}$, which would be impractical.
+For an error of $10^{-n}$, a number $\propto 10^{2n}$ of function calls is
+needed. To compute an integral within double precision,
+an impossibly large number of $\sigma \sim 10^{32}$ calls is needed,
+which makes this method unpractical for high-precision applications.


 ## Stratified sampling

 In statistics, stratified sampling is a method of sampling from a population
 partitioned into subpopulations. Stratification, indeed, is the process of
-dividing the primary sample into subgroups (strata) before sampling random 
+dividing the primary sample into subgroups (strata) before sampling
 within each stratum.  
-Given the mean $\bar{x}_i$ and variance ${\sigma^2_x}_i$ of an entity $x$
-sorted with simple random sampling in the $i^{\text{th}}$ strata, such as:
+Given a sample $\{x_j\}_i$ of the $i$-th strata, its mean $\bar{x}_i$ and
+variance ${\sigma^2_x}_i$, are given by
 $$
  \bar{x}_i = \frac{1}{n_i} \sum_j x_j
 $$
-
-and:
+and from:
 $$
  \sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( x_j - \bar{x}_i \right)^2
  \thus
  {\sigma^2_x}_i = \frac{1}{n_i^2} \sum_j \sigma_i^2 = \frac{\sigma_i^2}{n_i}
 $$
-
 where:

-  - $j$ runs over the points $x_j$ sampled in the $i^{\text{th}}$ stratum,
-  - $n_i$ is the number of points sorted in it,
-  - $\sigma_i^2$ is the variance associated with the $j^{\text{th}}$ point.
+  - $j$ runs over the points $x_j$ of the sample
+  - $n_i$ is the size of the sample
+  - $\sigma_i^2$ is the variance associated to every point of the $i$-th
+    stratum.

-then, the mean $\bar{x}$ and variance $\sigma_x^2$ estimated with stratified
-sampling for the whole population are:
+An estimation of the mean $\bar{x}$ and variance $\sigma_x^2$ for the whole
+population are then given by the stratified sampling as follows:
 $$
  \bar{x} = \frac{1}{N} \sum_i N_i \bar{x}_i \et
  \sigma_x^2 = \sum_i \left( \frac{N_i}{N} \right)^2 {\sigma_x}^2_i
             = \sum_i \left( \frac{N_i}{N} \right)^2 \frac{\sigma^2_i}{n_i}
 $$
+where:

-where $i$ runs over the strata, $N_i$ is the weight of the $i^{\text{th}}$
-stratum and $N$ is the sum of all strata weights.
+ - $i$ runs over the strata,
+ - $N_i$ is the weight of the $i$-th stratum
+ - $N$ is the sum of all strata weights.

 In practical terms, it can produce a weighted mean that has less variability
 than the arithmetic mean of a simple random sample of the whole population. In
@ -158,34 +163,31 @@ result will have a smaller error in estimation with respect to the one otherwise
 obtained with simple sampling.  
 For this reason, stratified sampling is used as a method of variance reduction
 when MC methods are used to estimate population statistics from a known
-population [@ridder17].
+population. For examples, see [@ridder17].


 ### MISER

-The MISER technique aims to reduce the integration error through the use of
+The MISER technique aims at reducing the integration error through the use of
 recursive stratified sampling.  
 As stated before, according to the law of large numbers, for a large number of
 extracted points, the estimation of the integral $I$ can be computed as:
 $$
-  I= V \cdot \langle f \rangle
+  I= V \cdot \avg{f}
 $$

 Since $V$ is known (in this case, $V = 1$), it is sufficient to estimate
-$\langle f \rangle$.
+$\avg{f}$.

 Consider two disjoint regions $a$ and $b$, such that $a \cup b = \Omega$, in
 which $n_a$ and $n_b$ points are respectively uniformly sampled. Given the
-Monte Carlo estimates of the means $\langle f \rangle_a$ and $\langle f
-\rangle_b$ of those points and their variances $\sigma_a^2$ and $\sigma_b^2$, if
-the weights $N_a$ and $N_b$ of $\langle f \rangle_a$ and $\langle f \rangle_b$
-are chosen unitary, then the variance $\sigma^2$ of the combined estimate
-$\langle f \rangle$:
+Monte Carlo estimates of the means $\avg{f}_a$ and $\avg{f}_b$ of those points
+and their variances $\sigma_a^2$ and $\sigma_b^2$, if the weights $N_a$ and
+$N_b$ of $\avg{f}_a$ and $\avg{f}_b$ are chosen unitary, then the variance
+$\sigma^2$ of the combined estimate $\avg{f}$:
 $$
-  \langle f \rangle = \frac{1}{2} \left( \langle f \rangle_a
-                    + \langle f \rangle_b \right)
+  \avg{f} = \frac{1}{2} \left( \avg{f}_a + \avg{f}_b \right)
 $$
-
 is given by:
 $$
  \sigma^2 = \frac{\sigma_a^2}{4n_a} + \frac{\sigma_b^2}{4n_b}
@ -201,182 +203,170 @@ Hence, the smallest error estimate is obtained by allocating sample points in
 proportion to the standard deviation of the function in each sub-region.  
 The whole integral estimate and its variance are therefore given by:
 $$
-  I = V \cdot \langle f \rangle \et \sigma_I^2 = V^2 \cdot \sigma^2
+  I = V \cdot \avg{f} \et \sigma_I^2 = V^2 \cdot \sigma^2
 $$

 When implemented, MISER is in fact a recursive method. First, all the possible
 bisections of $\Omega$ are tested and the one which minimizes the combined
 variance of the two sub-regions is selected. In order to speed up the
 algorithm, the variance in the sub-regions is estimated with a fraction of the
-total number of available points (function calls) which is default set to 0.1.
+total number of available points (function calls), in GSL it default to 0.1.
 The remaining points are allocated to the sub-regions using the formula for
 $n_a$ and $n_b$, once the variances are computed.  
-The same procedure is then repeated recursively for each of the two half-spaces
-from the best bisection. When less than 32*16 (default option) function calls
-are available in a single section, the integral and the error in this section
-are estimated using the plain Monte Carlo algorithm.  
+
+This procedure is then repeated recursively for each of the two half-regions
+from the best bisection. When the allocated calls for a region running out
+(less than 512 in GSL), the method falls back to a plain Monte Carlo.  
 The final individual values and their error estimates are then combined upwards
 to give an overall result and an estimate of its error [@sayah19].

-![Estimations $I^{\text{oss}}$ of the integral $I$ obtained for the three
-  implemented method for different values of function calls. Errorbars
-  showing their estimated uncertainties.](images/5-MC_MC_MI.pdf){#fig:MI}
+![Estimations $I^{\text{oss}}$ of the integral $I$ obtained
+  for the three implemented method for different values of
+  function calls. Errorbars showing their estimated
+  uncertainties.](images/5-MC_MC_MI.pdf){#fig:miser-iter}

-Results for this particular sample are shown in black in @fig:MI and some of
-them are listed in @tbl:MI. Except for the first very little number of calls,
-the improvement with respect to the Plain MC technique (in red) is appreciable.
+Results for this particular sample are shown in black in @fig:miser-iter and
+some of them are listed in @tbl:miser-res. Except for the first very little
+number of calls, the improvement with respect to the Plain MC technique (in
+red) is appreciable.

---------------------------------------------------------------------------
-calls               $I^{\text{oss}}$        $\sigma$            diff
------------------ ------------------ ------------------ ------------------
-\num{5e5}             1.7182850738       0.0000021829       0.0000032453
+--------------------------------------------------------------------
+calls        $I^{\text{oss}}$        $\sigma$            diff
+----------- ------------------ ------------------ ------------------
+\num{5e5}      1.7182850738       0.0000021829       0.0000032453

-\num{5e6}             1.7182819143       0.0000001024       0.0000000858
+\num{5e6}      1.7182819143       0.0000001024       0.0000000858

-\num{5e7}             1.7182818221       0.0000000049       0.0000000064
---------------------------------------------------------------------------
+\num{5e7}      1.7182818221       0.0000000049       0.0000000064
+--------------------------------------------------------------------

 Table: MISER results with different numbers of function calls. Differences
-       between computed and exact values are given in diff. {#tbl:MI}
+       between computed and exact values are given in diff. {#tbl:miser-res}

-This time the convergence is much faster: already with 500'000 number of points,
-the correct results differs from the computed one at the fifth decimal digit.
-Once again, the error lies always in the same order of magnitude of diff.
+The convergence is much faster than a plain MC: at 500'000 function calls, the
+estimate agrees with the exact integral to the fifth decimal place. Once again,
+the standard deviation and the difference share the same magnitude.


 ## Importance sampling

-In statistics, importance sampling is a method which samples points from the
-probability distribution $f$ itself, so that the points cluster in the regions
-that make the largest contribution to the integral.
+In Monte Carlo methods, importance sampling is a technique which samples points
+from distribution whose shape is close to the integrand $f$ itself. In this way
+the points cluster in the regions that make the largest contribution to the
+integral $\int f(x)dx$ and consequently decrease the variance.

-Remind that $I = V \cdot \langle f \rangle$ and therefore only $\langle f
-\rangle$ is to be estimated. Consider a sample of $n$ points {$x_i$} generated
-according to a probability distribution function $P$ which gives thereby the
-following expected value:
+In a plain MC the points are sampled uniformly, so their probability
+density is given by
 $$
-  E [x, P] = \frac{1}{n} \sum_i x_i
+  g(x) = \frac{1}{V} \quad \forall x \in \Omega
 $$

-with variance:
+and the integral can be written as
 $$
-  \sigma^2 [E, P] = \frac{\sigma^2 [x, P]}{n}
-  \with \sigma^2 [x, P] = \frac{1}{n -1} \sum_i \left( x_i - E [x, P] \right)^2
+  I = \int_\Omega dx f(x) = V \int_\Omega f(x) \frac{1}{V}dx
+    \approx V \avg{f}
 $$

-where $i$ runs over the sample.  
-In the case of plain MC, $\langle f \rangle$ is estimated as the expected
-value of points {$f(x_i)$} sorted with $P (x_i) = 1 \quad \forall i$, since they
-are evenly distributed in $\Omega$. The idea is to sample points from a
-different distribution to lower the variance of $E[x, P]$, which results in
-lowering $\sigma^2 [x, P]$. This is accomplished by choosing a random variable
-$y$ and defining a new probability $P^{(y)}$ in order to satisfy:
+More generally, consider a distribution $h(x)$ and similarly do
 $$
-  E [x, P] = E \left[ \frac{x}{y}, P^{(y)} \right]
+    I
+    = \int_\Omega dx f(x)
+    = \int_\Omega dx \, \frac{f(x)}{h(x)} \, h(x)
+    = \Exp \left[ \frac{f}{h}, h \right]
 $$
+where $\Exp[X, h]$ is the expected value of $X$ wrt $h$.
+Also note that $h$ has to vanish outside $\Omega$ for this to hold.

-which is to say:
+As anticipated, to reduce the variance $h$ must be close to $f$.
+Assuming they are proportional, $h(x) = \alpha |f(x)|$, it follows
+that:
 $$
-  I = \int \limits_{\Omega} dx f(x) = 
-      \int \limits_{\Omega} dx \, \frac{f(x)}{g(x)} \, g(x)=
-      \int \limits_{\Omega} dx \, w(x) \, g(x)
-$$
-
-where $E \, \longleftrightarrow \, I$ and:
-$$
-  \begin{cases}
-    f(x) \, \longleftrightarrow \, x              \\
-    1  \, \longleftrightarrow \, P
-  \end{cases}
+  \Exp \left[ \frac{f}{h}, h \right] = \frac{1}{\alpha}
  \et
-  \begin{cases}
-    g(x) \, \longleftrightarrow \, y = P^{(y)}    \\
-    w(x) = \frac{f(x)}{g(x)} \, \longleftrightarrow \, \frac{x}{y}
-  \end{cases}
+  \Var \left[ \frac{f}{h}, h \right] = 0
 $$

-Where the symbol $\longleftrightarrow$ points out the connection between the
-variables. The best variable $y$ would be:
+For the expected value to give the original $I$, the proportionality constant
+must be taken to be $I^{-1}$, meaning:
 $$
-  y = \frac{x}{E [x, P]} \, \longleftrightarrow \, \frac{f(x)}{I}
-  \thus \frac{x}{y} = E [x, P]
+  h(z) = \frac{1}{I}\, |f(z)|
 $$

-and even a single sample under $P^{(y)}$ would be sufficient to give its value:
-it can be shown that under this probability distribution the variance vanishes.
-Obviously, it is not possible to take exactly this choice, since $E [x, P]$ is
-not given a priori. However, this gives an insight into what importance sampling
-does. In fact, given that:
-$$
-  E [x, P] = \int \limits_{a = - \infty}^{a = + \infty}
-             da \, a P(x \in [a, a + da])
-$$
+The sampling from this $h$ would produce a perfect result with zero variance.
+Of course, this is nonsense: if $I$ is known in advance, there would be no need
+to do a Monte Carlo integration to begin with. Nonetheless, this example serves
+to prove how variance reduction is achieved by sampling from an approximation
+of the integrand.

-the best probability $P^{(y)}$ redistributes the law of $x$ so that its samples
-frequencies are sorted directly according to their weights in $E[x, P]$, namely:
-$$
-  P^{(y}(x \in [a, a + da]) = \frac{1}{E [x, P]} a P (x \in [a, a + da])
-$$
-
-In conclusion, since certain values of $x$ have more impact on $E [x, P]$ than
-others, these "important" values must be emphasized by sampling them more
+In conclusion, since certain values of $x$ have more impact on $\Exp[f/h, h]$
+than others, these "important" values must be emphasized by sampling them more
 frequently. As a consequence, the estimator variance will be reduced.


 ### VEGAS

+The VEGAS algorithm [@lepage78] of G. P. Lepage is based on importance
+sampling. As stated before, it is in practice impossible to sample points from
+the best distribution $h(x)$: only a good approximation can be achieved. The
+VEGAS algorithm attempts this by building a histogram of the function $f$ in
+different subregions with an iterative method, namely:

-The VEGAS algorithm of Lepage is based on importance sampling. It aims to
-reduce the integration error by concentrating points in the regions that make
-the largest contribution to the integral.
+  - a fixed number of points (function calls) is generated uniformly in the
+    whole region;

-As stated before, in practice it is impossible to sample points from the best
-distribution $P^{(y)}$: only a good approximation can be achieved. In GSL, the
-VEGAS algorithm approximates the distribution by histogramming the function $f$
-in different subregions with a iterative method [@lepage78], namely:
-
-  - a fixed number of points (function calls) is evenly generated in the whole
-    region;
-  - the volume $V$ is divided into $N$ intervals with width $\Delta x_i =
+  - the volume $V$ is divided into $N$ intervals of width $\Delta x_i =
    \Delta x \, \forall \, i$, where $N$ is limited by the computer storage
    space available and must be held constant from iteration to iteration.
-    Default $N = 50$;
+    (In GSL this default to $N = 50$);
+
  - each interval is then divided into $m_i + 1$ subintervals, where:
-      $$
-        m_i = K \frac{\bar{f}_i \Delta x_i}{\sum_j \bar{f}_j \Delta x_j}
-      $$
+    $$
+      m_i = K \frac{\bar{f}_i \Delta x_i}{\sum_j \bar{f}_j \Delta x_j}
+    $$

    where $j$ runs over all the intervals and $\bar{f}_i$ is the average value
    of $f$ in the interval. Hence, $m_i$ is therefore a measure of the
    "importance" of the interval with respect to the others: the higher
-    $\bar{f}_i$, the higher $m_i$. The stiffness $K$ is default set to 1.5;
+    $\bar{f}_i$, the higher $m_i$. The constant $K$ is called stiffness.
+    It is defaults 1.5 in GSL;
+
  - as it is desirable to restore the number of intervals to its original value
-    $N$, groups of the new intervals must be amalgamated into larger intervals,
-    the number of subintervals in each group being constant. The net effect is
+    $N$, groups of the new intervals must be merged into larger intervals, the
+    number of subintervals in each group being constant. The net effect is
    to alter the intervals sizes, while keeping the total number constant, so
    that the smallest intervals occur where $f$ is largest;
-  - the new grid is used and further refined in subsequent iterations until the
-    optimal grid has been obtained. By default, the number of iterations is set
-    to 5.

-At the end, a cumulative estimate of the integral $I$ and its error $\sigma_I$
-are made based on weighted average:
+  - the function is integrated with a plain MC method in each interval
+    and the sum of the integrals is taken as the $j$-th estimate of $I$. Its
+    error is given the sum of the variances in each interval.
+
+  - the new grid is used and further refined in subsequent iterations.
+    By default, the number of iterations 5 in GSL.
+
+The final estimate of the integral $I$ and its error
+$\sigma_I$ are made based on weighted average:
 $$
-  I = \sigma_I^2 \sum_i \frac{I_i}{\sigma_i^2}
+  \avg{I} = \sigma_I^2 \sum_i \frac{I_i}{\sigma_i^2}
  \with
-  \sigma_I^2 = \left( \sum_i \frac{1}{\sigma_i^2} \right)^{-1}
+  \frac{1}{\sigma_I^2} =  \sum_i \frac{1}{\sigma_i^2}
+$$
+where $I_i$ and $\sigma_i$ are are the integral and standard deviation
+estimated in each iteration.
+
+The reliability of the result is asserted by a chi-squared per degree of
+freedom $\chi_r^2$, which should be close to 1 for a good estimation. At a
+given iteration $i$, the $\chi^2_i$ is computed as follows:
+$$
+  \chi^2_i = \sum_{j \le i}
+       \frac{(I_j - \avg{I})^2}{\sigma_j^2}
 $$

-where $I_i$ and $\sigma_i$ are are the integral and standard deviation
-estimated in each interval $\Delta x_i$ of the last iteration using the plain
-MC technique.  
-For the results to be reliable, the chi-squared per degree of freedom
-$\chi_r^2$ must be consistent with 1. While performing the eiterations, if
-the $\chi_r^2$ value exceed 1.5, the cycle breaks.
+While performing the iterations, if the value of $\chi_r^2$ exceed 1.5, the
+routine stops since is not making progress.

-Clearly, once again a better estimation is achieved with a greater number of
-function calls. For this particular sample, the most accurate results are shown
-in @fig:MI_VE and some of them are listed in @tbl:VEGAS.
+Clearly, a better estimation is achieved with a greater number of function
+calls. For this particular sample, the most accurate results are shown in
+@fig:vegas-iter and some of them are listed in @tbl:vegas-res.

 ----------------------------------------------------------------------------------------------
 calls               $I^{\text{oss}}$        $\sigma$            diff            $\chi_r^2$
@ -389,19 +379,19 @@ calls               $I^{\text{oss}}$        $\sigma$            diff
 ----------------------------------------------------------------------------------------------

 Table: Some VEGAS results with different numbers of
-       function calls. {#tbl:VEGAS}
+       function calls. {#tbl:vegas-res}

-As can be appreciated in @fig:MI_VE, the VEGAS algorithm manages to compute
-the integral value in a most accurate way with respect to MISER. The $\chi_r^2$
-turns out to be close enough to 1 to guarantee a good estimation of $I$,
-goodness which is also confirmed by the very small difference between estimation
-and exact value, as shown in @tbl:VEGAS: with a number of \num{5e7} of function
-calls, the difference is smaller than \num{1e-10}.
+As can be appreciated in @fig:vegas-iter, the VEGAS algorithm manages to compute the
+integral value more accurately compared to MISER. The $\chi_r^2$ turns out to
+be close enough to 1 to guarantee a good estimation of $I$, goodness which is
+also confirmed by the very small difference shown in @tbl:vegas-res.
+In fact, with a number of \num{5e7} function calls, the difference is
+smaller than \num{1e-10}.

 ![Only the most accurate results are shown in order to stress the
  differences between VEGAS (in gray) and MISER (in black) methods
-  results.](images/5-MC_MI_VE.pdf){#fig:MI_VE}
+  results.](images/5-MC_MI_VE.pdf){#fig:vegas-iter}

 In conclusion, between a plain Monte Carlo technique, stratified sampling and
-importance sampling, the last one turned out to be the most powerful mean to
+importance sampling, the last turned out to be the most powerful mean to
 obtain a good estimation of the integrand.