From 3688bceaf27e1757fbc91ab808923c119ff6f31d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gi=C3=B9=20Marcer?= <giuliamarcer@yahoo.it>
Date: Wed, 3 Jun 2020 18:51:11 +0200
Subject: [PATCH] ex-5: review

---
 notes/sections/5.md | 131 ++++++++++++++++++++------------------------
 1 file changed, 58 insertions(+), 73 deletions(-)

diff --git a/notes/sections/5.md b/notes/sections/5.md
index 8eb10c7..15c9122 100644
--- a/notes/sections/5.md
+++ b/notes/sections/5.md
@@ -2,12 +2,15 @@
 
 The following integral is to be evaluated comparing different Monte Carlo
 techniques.
-
+\vspace{20pt}
+$$
+I = \int_0^1 \!\!\! dx \, e^x
+$$
+\vspace{-70pt}
 \begin{figure}
 \hypertarget{fig:exp}{%
 \centering
 \begin{tikzpicture}
-  \definecolor{cyclamen}{RGB}{146, 24, 43}
   % Integral
   \filldraw [cyclamen!15!white, domain=0:5, variable=\x]
             (0,0) -- plot({\x},{exp(\x/5)}) -- (5,0) -- cycle;
@@ -21,19 +24,16 @@ techniques.
   % Plot
   \draw [domain=-2:7, smooth, variable=\x,
          cyclamen, ultra thick] plot ({\x},{exp(\x/5)});
-  % Equation
-  \node [above] at (2.5, 2.5) {$I = \int\limits_0^1 dx \, e^x$};
 \end{tikzpicture}
 \caption{Plot of the integral to be evaluated.}
 }
 \end{figure}
-
 whose exact value is 1.7182818285...
 
 The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser
 and Vegas. Besides being commonly used, these were chosen for also being
 implemented in the GSL libraries `gsl_monte_plain`, `gsl_monte_miser` and
-`gsl_monte_vegas`, respectively.
+`gsl_monte_vegas`.
 
 
 ## Plain Monte Carlo
@@ -44,24 +44,21 @@ $$
   I = \int\limits_{\Omega} dx \, f(x)
   \with V = \int\limits_{\Omega} dx
 $$
-
 the simplest MC method approach is to sample $N$ points $x_i$ in $V$ and
 approximate $I$ as:
 $$
   I \approx I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \avg{f}
 $$
-
 If $x_i$ are uniformly distributed $I_N \rightarrow I$ for $N \rightarrow +
-\infty$ by the law of large numbers, whereas the sample variance can be
+\infty$ by the law of large numbers, whereas the integral variance can be
 estimated as:
 $$
   \sigma^2_f = \frac{1}{N - 1}
     \sum_{i = 1}^N \left( f(x_i) - \avg{f} \right)^2
-  \et 
+  \et
   \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N
     \sigma^2_f = \frac{V^2}{N} \sigma^2_f
 $$
-
 Thus, the error decreases as $1/\sqrt{N}$.  
 Unlike in deterministic methods, the error estimate is not a strict bound:
 random sampling may not cover all the important features of the integrand and
@@ -99,15 +96,15 @@ Table: Some MC results with three different numbers of function calls.
 As can be seen, $\sigma$ is always of the same order of magnitude of diff,
 except for very low numbers of function calls. Even with \num{5e7} calls,
 $I^{\text{oss}}$ still differs from $I$ at the fifth decimal place, meaning
-that this method shows a really slow convergence.  
+that this method shows a really slow convergence.
+
 The $\sigma$ dependence on the number $C$ of function calls was checked with a
 least square minimization by modeling the data with the function:
 $$
   \sigma = \frac{a}{x^b}
 $$
-
-As can be seen in @fig:err_fit, the obtained result confirmes the expected value
-of $b^{\text{exp}} = 0.5$, having found $b \sim 0.499$.  
+The obtained result (@fig:err_fit) confirms the expected result
+$b^{\text{exp}} = 0.5$, with an observed value of $\sim 0.499$.  
 Given this dependence, for an error of $10^{-n}$, a number $\propto 10^{2n}$ of
 function calls is needed. To compute an integral within double precision, an
 impossibly large number of $\sigma \sim 10^{32}$ calls is needed, which makes
@@ -125,11 +122,11 @@ partitioned into subpopulations. Stratification, indeed, is the process of
 dividing the primary sample into subgroups (strata) before sampling
 within each stratum.  
 Given a sample $\{x_j\}_i$ of the $i$-th strata, its mean $\bar{x}_i$ and
-variance ${\sigma^2_x}_i$, are given by
+variance ${\sigma^2_x}_i$, are given by:
 $$
   \bar{x}_i = \frac{1}{n_i} \sum_j x_j
 $$
-and from:
+and
 $$
   \sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( x_j - \bar{x}_i \right)^2
   \thus
@@ -137,8 +134,8 @@ $$
 $$
 where:
 
-  - $j$ runs over the points $x_j$ of the sample
-  - $n_i$ is the size of the sample
+  - $j$ runs over the points $x_j$ of the sample,
+  - $n_i$ is the size of the sample,
   - $\sigma_i^2$ is the variance associated to every point of the $i$-th
     stratum.
 
@@ -169,12 +166,11 @@ population. For examples, see [@ridder17].
 
 The MISER technique aims at reducing the integration error through the use of
 recursive stratified sampling.  
-As stated before, according to the law of large numbers, for a large number of
+As stated before, according to the law of large numbers, for a great number of
 extracted points, the estimation of the integral $I$ can be computed as:
 $$
   I= V \cdot \avg{f}
 $$
-
 Since $V$ is known (in this case, $V = 1$), it is sufficient to estimate
 $\avg{f}$.
 
@@ -191,26 +187,23 @@ is given by:
 $$
   \sigma^2 = \frac{\sigma_a^2}{4n_a} + \frac{\sigma_b^2}{4n_b}
 $$
-
 It can be shown that this variance is minimized by distributing the points such
 that:
 $$
   \frac{n_a}{n_a + n_b} = \frac{\sigma_a}{\sigma_a + \sigma_b}
 $$
-
 Hence, the smallest error estimate is obtained by allocating sample points in
 proportion to the standard deviation of the function in each sub-region.  
 The whole integral estimate and its variance are therefore given by:
 $$
   I = V \cdot \avg{f} \et \sigma_I^2 = V^2 \cdot \sigma^2
 $$
-
 When implemented, MISER is in fact a recursive method. First, all the possible
 bisections of $\Omega$ are tested and the one which minimizes the combined
 variance of the two sub-regions is selected. In order to speed up the
 algorithm, the variance in the sub-regions is estimated with a fraction of the
-total number of available points (function calls), in GSL it default to 0.1.
-The remaining points are allocated to the sub-regions using the formula for
+total number of available points (function calls), in GSL it is default set to
+0.1. The remaining points are allocated to the sub-regions using the formula for
 $n_a$ and $n_b$, once the variances are computed.  
 
 This procedure is then repeated recursively for each of the two half-regions
@@ -219,15 +212,18 @@ from the best bisection. When the allocated calls for a region running out
 The final individual values and their error estimates are then combined upwards
 to give an overall result and an estimate of its error [@sayah19].
 
-![Estimations $I^{\text{oss}}$ of the integral $I$ obtained
-  for the three implemented method for different values of
-  function calls. Errorbars showing their estimated
-  uncertainties.](images/5-MC_MC_MI.pdf){#fig:miser-iter}
+![Estimations $I^{\text{oss}}$ of the integral $I$ obtained for plain MC and
+  MISER for different values of function calls. Errorbars showing their
+  estimated uncertainties.](images/5-MC_MC_MI.pdf){#fig:miser-iter}
 
 Results for this particular sample are shown in black in @fig:miser-iter and
-some of them are listed in @tbl:miser-res. Except for the first very little
-number of calls, the improvement with respect to the Plain MC technique (in
-red) is appreciable.
+compared with the plain MC results (in red). Some of them are listed in
+@tbl:miser-res. Except for the first very little number of calls, the
+improvement with respect to the Plain MC technique is appreciable.
+
+The convergence is much faster than a plain MC: at 500'000 function calls, the
+estimate agrees with the exact integral to the fifth decimal place. Once again,
+the standard deviation and the difference share the same magnitude.
 
 --------------------------------------------------------------------
 calls        $I^{\text{oss}}$        $\sigma$            diff
@@ -242,55 +238,45 @@ calls        $I^{\text{oss}}$        $\sigma$            diff
 Table: MISER results with different numbers of function calls. Differences
        between computed and exact values are given in diff. {#tbl:miser-res}
 
-The convergence is much faster than a plain MC: at 500'000 function calls, the
-estimate agrees with the exact integral to the fifth decimal place. Once again,
-the standard deviation and the difference share the same magnitude.
-
 
 ## Importance sampling
 
 In Monte Carlo methods, importance sampling is a technique which samples points
-from distribution whose shape is close to the integrand $f$ itself. In this way
+from distribution whose shape is close to the integrand $f$ itself. This way,
 the points cluster in the regions that make the largest contribution to the
 integral $\int f(x)dx$ and consequently decrease the variance.
 
 In a plain MC the points are sampled uniformly, so their probability
-density is given by
+density is given by:
 $$
   g(x) = \frac{1}{V} \quad \forall x \in \Omega
 $$
-
-and the integral can be written as
+and the integral can be written as:
 $$
   I = \int_\Omega dx f(x) = V \int_\Omega f(x) \frac{1}{V}dx
     \approx V \avg{f}
 $$
-
-More generally, consider a distribution $h(x)$ and similarly do
+More generally, consider a distribution $h(x)$ and similarly do:
 $$
     I
     = \int_\Omega dx f(x)
     = \int_\Omega dx \, \frac{f(x)}{h(x)} \, h(x)
     = \Exp \left[ \frac{f}{h}, h \right]
 $$
-where $\Exp[X, h]$ is the expected value of $X$ wrt $h$.
-Also note that $h$ has to vanish outside $\Omega$ for this to hold.
-
-As anticipated, to reduce the variance $h$ must be close to $f$.
-Assuming they are proportional, $h(x) = \alpha |f(x)|$, it follows
-that:
+where $\Exp[X, h]$ is the expected value of $X$ wrt $h$. Also note that $h$ has
+to vanish outside $\Omega$ for this to hold.  
+To reduce the variance, as anticipated, $h$ must be close to $f$. Assuming they
+are proportional, $h(x) = \alpha |f(x)|$, it follows that:
 $$
   \Exp \left[ \frac{f}{h}, h \right] = \frac{1}{\alpha}
   \et
   \Var \left[ \frac{f}{h}, h \right] = 0
 $$
-
 For the expected value to give the original $I$, the proportionality constant
 must be taken to be $I^{-1}$, meaning:
 $$
   h(z) = \frac{1}{I}\, |f(z)|
 $$
-
 The sampling from this $h$ would produce a perfect result with zero variance.
 Of course, this is nonsense: if $I$ is known in advance, there would be no need
 to do a Monte Carlo integration to begin with. Nonetheless, this example serves
@@ -316,7 +302,7 @@ different subregions with an iterative method, namely:
   - the volume $V$ is divided into $N$ intervals of width $\Delta x_i =
     \Delta x \, \forall \, i$, where $N$ is limited by the computer storage
     space available and must be held constant from iteration to iteration.
-    (In GSL this default to $N = 50$);
+    (In GSL this defaults to $N = 50$);
 
   - each interval is then divided into $m_i + 1$ subintervals, where:
     $$
@@ -326,8 +312,8 @@ different subregions with an iterative method, namely:
     where $j$ runs over all the intervals and $\bar{f}_i$ is the average value
     of $f$ in the interval. Hence, $m_i$ is therefore a measure of the
     "importance" of the interval with respect to the others: the higher
-    $\bar{f}_i$, the higher $m_i$. The constant $K$ is called stiffness.
-    It is defaults 1.5 in GSL;
+    $\bar{f}_i$, the higher $m_i$. The constant $K$, called stiffness, defaults
+    to 1.5 in GSL;
 
   - as it is desirable to restore the number of intervals to its original value
     $N$, groups of the new intervals must be merged into larger intervals, the
@@ -337,10 +323,10 @@ different subregions with an iterative method, namely:
 
   - the function is integrated with a plain MC method in each interval
     and the sum of the integrals is taken as the $j$-th estimate of $I$. Its
-    error is given the sum of the variances in each interval.
+    error is given by the sum of the variances in each interval;
 
-  - the new grid is used and further refined in subsequent iterations.
-    By default, the number of iterations 5 in GSL.
+  - the new grid is further refined in subsequent iterations. By default, the
+    number of iterations is 5 in GSL.
 
 The final estimate of the integral $I$ and its error
 $\sigma_I$ are made based on weighted average:
@@ -359,27 +345,13 @@ $$
   \chi^2_i = \sum_{j \le i}
        \frac{(I_j - \avg{I})^2}{\sigma_j^2}
 $$
-
-While performing the iterations, if the value of $\chi_r^2$ exceed 1.5, the
-routine stops since is not making progress.
+While performing the iterations, if the value of $\chi_r^2$ exceeds 1.5, the
+routine stops since it is not making progress.
 
 Clearly, a better estimation is achieved with a greater number of function
 calls. For this particular sample, the most accurate results are shown in
 @fig:vegas-iter and some of them are listed in @tbl:vegas-res.
 
-----------------------------------------------------------------------------------------------
-calls               $I^{\text{oss}}$        $\sigma$            diff            $\chi_r^2$
------------------- ------------------ ------------------ ------------------ ------------------
-\num{5e5}             1.7182818281       0.0000000012       0.0000000004           1.457
-
-\num{5e6}             1.7182818284       0.0000000000       0.0000000001           0.632
-
-\num{5e7}             1.7182818285       0.0000000000       0.0000000000           0.884
-----------------------------------------------------------------------------------------------
-
-Table: Some VEGAS results with different numbers of
-       function calls. {#tbl:vegas-res}
-
 ![Only the most accurate results are shown in order to stress the
   differences between VEGAS (in gray) and MISER (in black) methods
   results.](images/5-MC_MI_VE.pdf){#fig:vegas-iter}
@@ -391,6 +363,19 @@ also confirmed by the very small difference shown in @tbl:vegas-res.
 In fact, with a number of \num{5e7} function calls, the difference is
 smaller than \num{1e-10}.
 
+----------------------------------------------------------------------------------------------
+calls               $I^{\text{oss}}$        $\sigma$            diff            $\chi_r^2$
+------------------ ------------------ ------------------ ------------------ ------------------
+\num{5e5}             1.7182818281       0.0000000012       0.0000000004           1.457
+
+\num{5e6}             1.7182818284       0.0000000000       0.0000000001           0.632
+
+\num{5e7}             1.7182818285       0.0000000000       0.0000000000           0.884
+----------------------------------------------------------------------------------------------
+
+Table: Best VEGAS results with different numbers of
+       function calls. {#tbl:vegas-res}
+
 In conclusion, between a plain Monte Carlo technique, stratified sampling and
 importance sampling, the last turned out to be the most powerful mean to
 obtain a good estimation of the integrand.