analistica/notes/sections/5.md

# Exercize 5

**Numerically compute an integral value via Monte Carlo approaches**

The integral to be evaluated is the following:

$$
  I = \int\limits_0^1 dx \, e^x
$$

\begin{figure}
\hypertarget{fig:exp}{%
\centering
\begin{tikzpicture}
  \definecolor{cyclamen}{RGB}{146, 24, 43}
  % Integral
  \filldraw [cyclamen!15!white, domain=0:5, variable=\x]
            (0,0) -- plot({\x},{exp(\x/5)}) -- (5,0) -- cycle;
  \draw [cyclamen] (5,0) -- (5,2.7182818);
  \node [below] at (5,0) {1};
  % Axis
  \draw [thick, <-] (0,4) -- (0,0);
  \draw [thick, ->] (-2,0) -- (7,0);
  \node [below right] at (7,0) {$x$};
  \node [above left] at (0,4) {$e^{x}$};
  % Plot
  \draw [domain=-2:7, smooth, variable=\x,
         cyclamen, ultra thick] plot ({\x},{exp(\x/5)});
\end{tikzpicture}
\caption{Plot of the integral to be evaluated.}
}
\end{figure}

whose exact value is 1.7182818285...

The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser
and Vegas. Besides this popularity fact, these three method were chosen for
being the only ones implemented in the GSL library.


## Plain Monte Carlo

When the integral $I$ over a $n-$dimensional space $\Omega$ of volume $V$ of a
function $f$ must be evaluated, that is:

$$
  I = \int\limits_{\Omega} dx \, f(x)
  \with V = \int\limits_{\Omega} dx
$$

the simplest MC method approach is to sample $N$ points $x_i$ evenly distributed
in $V$ and approx $I$ as:

$$
  I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle
$$

with $I_N \rightarrow I$ for $N \rightarrow + \infty$ for the law of large
numbers. Hence, the sample variance can be extimated by the sample variance:

$$
  \sigma^2_f = \frac{1}{N - 1} \sum_{i = 1}^N \left( f(x_i) - \langle f
  \rangle \right)^2 \et \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N
  \sigma^2_f = \frac{V^2}{N} \sigma^2_f
$$

Thus, the error decreases as $1/\sqrt{N}$.  
Unlike in deterministic methods, the estimate of the error is not a strict error
bound: random sampling may not uncover all the important features of the
integrand and this can result in an underestimate of the error.

In this case, $f(x) = e^{x}$ and $\Omega = [0,1]$.

Since the proximity of $I_N$ to $I$ is related to $N$, the accuracy of the
method is determined by how many points are generated, namely how many function
calls are exectuted when the method is implemented. In @tbl:MC, the obtained
results and errors $\sigma$ are shown. The estimated integrals for different
numbers of calls are compared to the expected value $I$ and the difference
'diff' between them is given.  
As can be seen, the MC method tends to underestimate the error for scarse
function calls. As previously stated, the higher the number of function calls,
the better the estimation of $I$. A further observation regards the fact that,
even with $50'000'000$ calls, the $I^{\text{oss}}$ still differs from $I$ at
the fifth decimal digit.

-------------------------------------------------------------------------
                   500'000 calls     5'000'000 calls    50'000'000 calls
----------------- ----------------- ------------------ ------------------
$I^{\text{oss}}$     1.7166435813      1.7181231109       1.7183387184

$\sigma$             0.0006955691      0.0002200309       0.0000695809

diff                 0.0016382472      0.0001587176       0.0000568899
-------------------------------------------------------------------------

Table: MC results with different numbers of function calls. {#tbl:MC}


## Stratified sampling

In statistics, stratified sampling is a method of sampling from a population
partitioned into subpopulations. Stratification, indeed, is the process of
dividing the primary sample into subgroups (strata) before sampling random 
within each stratum.  
Given the mean $\bar{x}_i$ and variance ${\sigma^2_x}_i$ of an entity $x$
sorted with simple random sampling in each strata, such as:

$$
  \bar{x}_i = \frac{1}{n_i} \sum_j x_j
$$

$$
  \sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( \frac{x_j - \bar{x}_i}{n_i}
               \right)^2
  \thus
  {\sigma^2_x}_i = \frac{1}{n_i^2} \sum_j \sigma_i^2 = \frac{\sigma_i^2}{n_i}
$$

where:

  - $j$ runs over the points $x_j$ sampled in the $i^{\text{th}}$ stratum
  - $n_i$ is the number of points sorted in it
  - $\sigma_i^2$ is the variance associated with the $j^{\text{th}}$ point

then the mean $\bar{x}$ and variance $\sigma_x^2$ estimated with stratified
sampling for the whole population are:

$$
  \bar{x} = \frac{1}{N} \sum_i N_i \bar{x}_i \et
  \sigma_x^2 = \sum_i \left( \frac{N_i}{N} \right)^2 {\sigma_x}^2_i
             = \sum_i \left( \frac{N_i}{N} \right)^2 \frac{\sigma^2_i}{n_i}
$$

where $i$ runs over the strata, $N_i$ is the weight of the $i^{\text{th}}$
stratum and $N$ is the sum of all strata weights.

In practical terms, it can produce a weighted mean that has less variability
than the arithmetic mean of a simple random sample of the whole population. In
fact, if measurements within strata have lower standard deviation, the final
result will have a smaller error in estimation with respect to the one otherwise
obtained with simple sampling.  
For this reason, stratified sampling is used as a method of variance reduction
when MC methods are used to estimate population statistics from a known
population.

**MISER**

The MISER technique aims to reduce the integration error through the use of
recursive stratified sampling.  
Consider two disjoint regions $a$ and $b$ with volumes $V_a$ and $V_b$ and Monte
Carlo estimates $I_a = V_a \cdot \langle f \rangle_a$ and $I_b = V_b \cdot
\langle f \rangle_b$ of the integrals, where $\langle f \rangle_a$ and $\langle
f \rangle_b$ are the means of $f$ of the points sorted in those regions, and
variances $\sigma_a^2$ and $\sigma_b^2$ of those points. If the weights $N_a$
and $N_b$ of $I_a$ and $I_b$ are unitary, then the variance $\sigma_I^2$ of the
combined estimate $I$:

\textcolor{red}{QUI}

$$
  I = \frac{1}{2} (I_a + I_b)
$$

is given by:

$$
  \sigma_I^2 = \frac{\sigma_a^2}{N_a} + \frac{\sigma_b^2}{N_b}
$$

It can be shown that this variance is minimized by distributing the points such
that:

$$
  \frac{N_a}{N_a + N_b} = \frac{\sigma_a}{\sigma_a + \sigma_b}
$$

Hence, the smallest error estimate is obtained by allocating sample points in
proportion to the standard deviation of the function in each sub-region.

such that $a \cup b = \Omega$

When implemented, MISER is in fact a recursive method. With a given step, all
the possible bisections are tested and the one which minimizes the combined
variance of the two sub-regions is selected. The same procedure is then repeated
recursively for each of the two half-spaces from the best bisection. At each
recursion step, the integral and the error are estimated using a plain Monte
Carlo algorithm.  
After a given number of calls, the final individual values and their error
estimates are then combined upwards to give an overall result and an estimate of
its error.

Results for this particular sample are shown in @tbl:MISER.

-------------------------------------------------------------------------
                   500'000 calls     5'000'000 calls    50'000'000 calls
----------------- ----------------- ------------------ ------------------
$I^{\text{oss}}$     1.7182850738      1.7182819143       1.7182818221

$\sigma$             0.0000021829      0.0000001024       0.0000000049

diff                 0.0000032453      0.0000000858       000000000064
-------------------------------------------------------------------------

Table: MISER results with different numbers of function calls. {#tbl:MISER}

The error, altough it lies always in the same order of magnitude of diff, seems
to seesaw around the correct value as $N$ varies.


## VEGAS \textcolor{red}{WIP}

The VEGAS algorithm is based on importance sampling. It samples points from the
probability distribution described by the function $f$, so that the points are
concentrated in the regions that make the largest contribution to the integral.

In general, if the MC integral of $f$ is sampled with points distributed
according to a probability distribution $g$, the following estimate of the integral
is obtained:

$$
  E (f|g \, , \, N) \with \sigma^2(f|g \, , \, N)
$$

If the probability distribution is chosen as $g = f$, it can be shown that the
variance vanishes, and the error in the estimate will therefore be zero.  
In practice, it is impossible to sample points from the exact distribution: only
a good approximation can be achieved. In GSL, the VEGAS algorithm approximates
the distribution by histogramming the function $f$ in different subregions. Each
histogram is used to define a sampling distribution for the next pass, which
consists in doing the same thing recorsively: this procedure converges
asymptotically to the desired distribution.

In order to avoid the number of histogram bins growing like $K^d$, the
probability distribution is approximated by a separable function:

$$
  f (x_1, x_2, \ldots) = f_1(x_1) f_2(x_2) \ldots
$$

so that the number of bins required is only $Kd$. This is equivalent to locating
the peaks of the function from the projections of the integrand onto the
coordinate axes. The efficiency of VEGAS depends on the validity of this
assumption. It is most efficient when the peaks of the integrand are
well-localized. If an integrand can be rewritten in a form which is
approximately separable this will increase the efficiency of integration with
VEGAS.

VEGAS incorporates a number of additional features, and combines both stratified
sampling and importance sampling. The integration region is divided into a number
of “boxes”, with each box getting a fixed number of points (the goal is 2). Each
box can then have a fractional number of bins, but if the ratio of bins-per-box is
less than two, Vegas switches to a kind variance reduction (rather than importance
sampling).


---

---------------------------------------------------------
       calls     plain MC        Miser          Vegas    
------------ -------------- -------------- --------------
    500'000   1.7166435813   1.7182850738   1.7182818354 

  5'000'000   1.7181231109   1.7182819143   1.7182818289 

 50'000'000   1.7183387184   1.7182818221   1.7182818285 
---------------------------------------------------------

Table: Results of the three methods. {#tbl:results}

---------------------------------------------------------
       calls     plain MC        Miser          Vegas    
------------ -------------- -------------- --------------
    500'000   0.0006955691   0.0000021829   0.0000000137

  5'000'000   0.0002200309   0.0000001024   0.0000000004

 50'000'000   0.0000695809   0.0000000049   0.0000000000
---------------------------------------------------------

Table: $\sigma$s of the three methods. {#tbl:sigmas}
initial commit 2020-03-06 02:24:32 +01:00			`# Exercize 5`

			`Numerically compute an integral value via Monte Carlo approaches`

			`The integral to be evaluated is the following:`

			`$$`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`I = \int\limits_0^1 dx \, e^x`
initial commit 2020-03-06 02:24:32 +01:00			`$$`

ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`\begin{figure}`
			`\hypertarget{fig:exp}{%`
			`\centering`
			`\begin{tikzpicture}`
			`\definecolor{cyclamen}{RGB}{146, 24, 43}`
			`% Integral`
			`\filldraw [cyclamen!15!white, domain=0:5, variable=\x]`
			`(0,0) -- plot({\x},{exp(\x/5)}) -- (5,0) -- cycle;`
			`\draw [cyclamen] (5,0) -- (5,2.7182818);`
			`\node [below] at (5,0) {1};`
			`% Axis`
			`\draw [thick, <-] (0,4) -- (0,0);`
			`\draw [thick, ->] (-2,0) -- (7,0);`
			`\node [below right] at (7,0) {$x$};`
			`\node [above left] at (0,4) {$e^{x}$};`
			`% Plot`
			`\draw [domain=-2:7, smooth, variable=\x,`
			`cyclamen, ultra thick] plot ({\x},{exp(\x/5)});`
			`\end{tikzpicture}`
			`\caption{Plot of the integral to be evaluated.}`
			`}`
			`\end{figure}`

			`whose exact value is 1.7182818285...`

			`The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser`
			`and Vegas. Besides this popularity fact, these three method were chosen for`
			`being the only ones implemented in the GSL library.`

initial commit 2020-03-06 02:24:32 +01:00
			`## Plain Monte Carlo`

ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`When the integral $I$ over a $n-$dimensional space $\Omega$ of volume $V$ of a`
			`function $f$ must be evaluated, that is:`

			`$$`
			`I = \int\limits_{\Omega} dx \, f(x)`
			`\with V = \int\limits_{\Omega} dx`
			`$$`

			`the simplest MC method approach is to sample $N$ points $x_i$ evenly distributed`
			`in $V$ and approx $I$ as:`
initial commit 2020-03-06 02:24:32 +01:00
			`$$`
			`I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle`
			`$$`

			`with $I_N \rightarrow I$ for $N \rightarrow + \infty$ for the law of large`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`numbers. Hence, the sample variance can be extimated by the sample variance:`
initial commit 2020-03-06 02:24:32 +01:00
			`$$`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`\sigma^2_f = \frac{1}{N - 1} \sum_{i = 1}^N \left( f(x_i) - \langle f`
			`\rangle \right)^2 \et \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N`
			`\sigma^2_f = \frac{V^2}{N} \sigma^2_f`
initial commit 2020-03-06 02:24:32 +01:00			`$$`

ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`Thus, the error decreases as $1/\sqrt{N}$.`
			`Unlike in deterministic methods, the estimate of the error is not a strict error`
			`bound: random sampling may not uncover all the important features of the`
			`integrand and this can result in an underestimate of the error.`

			`In this case, $f(x) = e^{x}$ and $\Omega = [0,1]$.`

initial commit 2020-03-06 02:24:32 +01:00			`Since the proximity of $I_N$ to $I$ is related to $N$, the accuracy of the`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`method is determined by how many points are generated, namely how many function`
			`calls are exectuted when the method is implemented. In @tbl:MC, the obtained`
			`results and errors $\sigma$ are shown. The estimated integrals for different`
			`numbers of calls are compared to the expected value $I$ and the difference`
			`'diff' between them is given.`
			`As can be seen, the MC method tends to underestimate the error for scarse`
			`function calls. As previously stated, the higher the number of function calls,`
			`the better the estimation of $I$. A further observation regards the fact that,`
			`even with $50'000'000$ calls, the $I^{\text{oss}}$ still differs from $I$ at`
			`the fifth decimal digit.`

			`-------------------------------------------------------------------------`
			`500'000 calls 5'000'000 calls 50'000'000 calls`
			`----------------- ----------------- ------------------ ------------------`
			`$I^{\text{oss}}$ 1.7166435813 1.7181231109 1.7183387184`

			`$\sigma$ 0.0006955691 0.0002200309 0.0000695809`
initial commit 2020-03-06 02:24:32 +01:00
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`diff 0.0016382472 0.0001587176 0.0000568899`
			`-------------------------------------------------------------------------`
initial commit 2020-03-06 02:24:32 +01:00
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`Table: MC results with different numbers of function calls. {#tbl:MC}`
initial commit 2020-03-06 02:24:32 +01:00

ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`## Stratified sampling`
initial commit 2020-03-06 02:24:32 +01:00
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`In statistics, stratified sampling is a method of sampling from a population`
			`partitioned into subpopulations. Stratification, indeed, is the process of`
			`dividing the primary sample into subgroups (strata) before sampling random`
			`within each stratum.`
			`Given the mean $\bar{x}_i$ and variance ${\sigma^2_x}_i$ of an entity $x$`
			`sorted with simple random sampling in each strata, such as:`

			`$$`
			`\bar{x}_i = \frac{1}{n_i} \sum_j x_j`
			`$$`
initial commit 2020-03-06 02:24:32 +01:00
			`$$`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`\sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( \frac{x_j - \bar{x}_i}{n_i}`
			`\right)^2`
			`\thus`
			`{\sigma^2_x}_i = \frac{1}{n_i^2} \sum_j \sigma_i^2 = \frac{\sigma_i^2}{n_i}`
initial commit 2020-03-06 02:24:32 +01:00			`$$`

ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`where:`

			`- $j$ runs over the points $x_j$ sampled in the $i^{\text{th}}$ stratum`
			`- $n_i$ is the number of points sorted in it`
			`- $\sigma_i^2$ is the variance associated with the $j^{\text{th}}$ point`

			`then the mean $\bar{x}$ and variance $\sigma_x^2$ estimated with stratified`
			`sampling for the whole population are:`
initial commit 2020-03-06 02:24:32 +01:00
			`$$`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`\bar{x} = \frac{1}{N} \sum_i N_i \bar{x}_i \et`
			`\sigma_x^2 = \sum_i \left( \frac{N_i}{N} \right)^2 {\sigma_x}^2_i`
			`= \sum_i \left( \frac{N_i}{N} \right)^2 \frac{\sigma^2_i}{n_i}`
initial commit 2020-03-06 02:24:32 +01:00			`$$`

ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`where $i$ runs over the strata, $N_i$ is the weight of the $i^{\text{th}}$`
			`stratum and $N$ is the sum of all strata weights.`

			`In practical terms, it can produce a weighted mean that has less variability`
			`than the arithmetic mean of a simple random sample of the whole population. In`
			`fact, if measurements within strata have lower standard deviation, the final`
			`result will have a smaller error in estimation with respect to the one otherwise`
			`obtained with simple sampling.`
			`For this reason, stratified sampling is used as a method of variance reduction`
			`when MC methods are used to estimate population statistics from a known`
			`population.`

			`MISER`

			`The MISER technique aims to reduce the integration error through the use of`
			`recursive stratified sampling.`
			`Consider two disjoint regions $a$ and $b$ with volumes $V_a$ and $V_b$ and Monte`
			`Carlo estimates $I_a = V_a \cdot \langle f \rangle_a$ and $I_b = V_b \cdot`
			`\langle f \rangle_b$ of the integrals, where $\langle f \rangle_a$ and $\langle`
			`f \rangle_b$ are the means of $f$ of the points sorted in those regions, and`
			`variances $\sigma_a^2$ and $\sigma_b^2$ of those points. If the weights $N_a$`
			`and $N_b$ of $I_a$ and $I_b$ are unitary, then the variance $\sigma_I^2$ of the`
			`combined estimate $I$:`

			`\textcolor{red}{QUI}`

			`$$`
			`I = \frac{1}{2} (I_a + I_b)`
			`$$`

			`is given by:`

			`$$`
			`\sigma_I^2 = \frac{\sigma_a^2}{N_a} + \frac{\sigma_b^2}{N_b}`
			`$$`

			`It can be shown that this variance is minimized by distributing the points such`
			`that:`
initial commit 2020-03-06 02:24:32 +01:00
			`$$`
			`\frac{N_a}{N_a + N_b} = \frac{\sigma_a}{\sigma_a + \sigma_b}`
			`$$`

ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`Hence, the smallest error estimate is obtained by allocating sample points in`
			`proportion to the standard deviation of the function in each sub-region.`

			`such that $a \cup b = \Omega$`

			`When implemented, MISER is in fact a recursive method. With a given step, all`
			`the possible bisections are tested and the one which minimizes the combined`
			`variance of the two sub-regions is selected. The same procedure is then repeated`
			`recursively for each of the two half-spaces from the best bisection. At each`
			`recursion step, the integral and the error are estimated using a plain Monte`
			`Carlo algorithm.`
			`After a given number of calls, the final individual values and their error`
			`estimates are then combined upwards to give an overall result and an estimate of`
			`its error.`

			`Results for this particular sample are shown in @tbl:MISER.`

			`-------------------------------------------------------------------------`
			`500'000 calls 5'000'000 calls 50'000'000 calls`
			`----------------- ----------------- ------------------ ------------------`
			`$I^{\text{oss}}$ 1.7182850738 1.7182819143 1.7182818221`

			`$\sigma$ 0.0000021829 0.0000001024 0.0000000049`

			`diff 0.0000032453 0.0000000858 000000000064`
			`-------------------------------------------------------------------------`

			`Table: MISER results with different numbers of function calls. {#tbl:MISER}`

			`The error, altough it lies always in the same order of magnitude of diff, seems`
			`to seesaw around the correct value as $N$ varies.`


			`## VEGAS \textcolor{red}{WIP}`

			`The VEGAS algorithm is based on importance sampling. It samples points from the`
			`probability distribution described by the function $f$, so that the points are`
			`concentrated in the regions that make the largest contribution to the integral.`

			`In general, if the MC integral of $f$ is sampled with points distributed`
			`according to a probability distribution $g$, the following estimate of the integral`
			`is obtained:`

			`$$`
			`E (f\|g \, , \, N) \with \sigma^2(f\|g \, , \, N)`
			`$$`

			`If the probability distribution is chosen as $g = f$, it can be shown that the`
			`variance vanishes, and the error in the estimate will therefore be zero.`
			`In practice, it is impossible to sample points from the exact distribution: only`
			`a good approximation can be achieved. In GSL, the VEGAS algorithm approximates`
			`the distribution by histogramming the function $f$ in different subregions. Each`
			`histogram is used to define a sampling distribution for the next pass, which`
			`consists in doing the same thing recorsively: this procedure converges`
			`asymptotically to the desired distribution.`

			`In order to avoid the number of histogram bins growing like $K^d$, the`
			`probability distribution is approximated by a separable function:`

			`$$`
			`f (x_1, x_2, \ldots) = f_1(x_1) f_2(x_2) \ldots`
			`$$`

			`so that the number of bins required is only $Kd$. This is equivalent to locating`
			`the peaks of the function from the projections of the integrand onto the`
			`coordinate axes. The efficiency of VEGAS depends on the validity of this`
			`assumption. It is most efficient when the peaks of the integrand are`
			`well-localized. If an integrand can be rewritten in a form which is`
			`approximately separable this will increase the efficiency of integration with`
			`VEGAS.`

			`VEGAS incorporates a number of additional features, and combines both stratified`
			`sampling and importance sampling. The integration region is divided into a number`
			`of “boxes”, with each box getting a fixed number of points (the goal is 2). Each`
			`box can then have a fractional number of bins, but if the ratio of bins-per-box is`
			`less than two, Vegas switches to a kind variance reduction (rather than importance`
			`sampling).`



initial commit 2020-03-06 02:24:32 +01:00
			`---`

			`---------------------------------------------------------`
			`calls plain MC Miser Vegas`
			`------------ -------------- -------------- --------------`
			`500'000 1.7166435813 1.7182850738 1.7182818354`

			`5'000'000 1.7181231109 1.7182819143 1.7182818289`

			`50'000'000 1.7183387184 1.7182818221 1.7182818285`
			`---------------------------------------------------------`

			`Table: Results of the three methods. {#tbl:results}`

			`---------------------------------------------------------`
			`calls plain MC Miser Vegas`
			`------------ -------------- -------------- --------------`
			`500'000 0.0006955691 0.0000021829 0.0000000137`

			`5'000'000 0.0002200309 0.0000001024 0.0000000004`

			`50'000'000 0.0000695809 0.0000000049 0.0000000000`
			`---------------------------------------------------------`

			`Table: $\sigma$s of the three methods. {#tbl:sigmas}`