analistica/notes/sections/5.md

# Exercize 5

The following integral must be evaluated:

$$
  I = \int\limits_0^1 dx \, e^x
$$

\begin{figure}
\hypertarget{fig:exp}{%
\centering
\begin{tikzpicture}
  \definecolor{cyclamen}{RGB}{146, 24, 43}
  % Integral
  \filldraw [cyclamen!15!white, domain=0:5, variable=\x]
            (0,0) -- plot({\x},{exp(\x/5)}) -- (5,0) -- cycle;
  \draw [cyclamen] (5,0) -- (5,2.7182818);
  \node [below] at (5,0) {1};
  % Axis
  \draw [thick, <-] (0,4) -- (0,0);
  \draw [thick, ->] (-2,0) -- (7,0);
  \node [below right] at (7,0) {$x$};
  \node [above left] at (0,4) {$e^{x}$};
  % Plot
  \draw [domain=-2:7, smooth, variable=\x,
         cyclamen, ultra thick] plot ({\x},{exp(\x/5)});
\end{tikzpicture}
\caption{Plot of the integral to be evaluated.}
}
\end{figure}

whose exact value is 1.7182818285...

The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser
and Vegas. Besides this popularity fact, these three method were chosen for
being the only ones implemented in the GSL library.


## Plain Monte Carlo

When the integral $I$ over a $n-$dimensional space $\Omega$ of volume $V$ of a
function $f$ must be evaluated, that is:

$$
  I = \int\limits_{\Omega} dx \, f(x)
  \with V = \int\limits_{\Omega} dx
$$

the simplest MC method approach is to sample $N$ points $x_i$ evenly distributed
in $V$ and approx $I$ as:

$$
  I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle
$$

with $I_N \rightarrow I$ for $N \rightarrow + \infty$ for the law of large
numbers. Hence, the sample variance can be extimated by the sample variance:

$$
  \sigma^2_f = \frac{1}{N - 1} \sum_{i = 1}^N \left( f(x_i) - \langle f
  \rangle \right)^2 \et \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N
  \sigma^2_f = \frac{V^2}{N} \sigma^2_f
$$

Thus, the error decreases as $1/\sqrt{N}$.  
Unlike in deterministic methods, the estimate of the error is not a strict error
bound: random sampling may not uncover all the important features of the
integrand and this can result in an underestimate of the error.

In this case, $f(x) = e^{x}$ and $\Omega = [0,1]$.

Since the proximity of $I_N$ to $I$ is related to $N$, the accuracy of the
method is determined by how many points are generated, namely how many function
calls are exectuted when the method is implemented. In @tbl:MC, the obtained
results and errors $\sigma$ are shown. The estimated integrals for different
numbers of calls are compared to the expected value $I$ and the difference
'diff' between them is given.  
As can be seen, the MC method tends to underestimate the error for scarse
function calls. As previously stated, the higher the number of function calls,
the better the estimation of $I$. A further observation regards the fact that,
even with $50'000'000$ calls, the $I^{\text{oss}}$ still differs from $I$ at
the fifth decimal digit.

-------------------------------------------------------------------------
                   500'000 calls     5'000'000 calls    50'000'000 calls
----------------- ----------------- ------------------ ------------------
$I^{\text{oss}}$     1.7166435813      1.7181231109       1.7183387184

$\sigma$             0.0006955691      0.0002200309       0.0000695809

diff                 0.0016382472      0.0001587176       0.0000568899
-------------------------------------------------------------------------

Table: MC results with different numbers of function calls. {#tbl:MC}


## Stratified sampling

In statistics, stratified sampling is a method of sampling from a population
partitioned into subpopulations. Stratification, indeed, is the process of
dividing the primary sample into subgroups (strata) before sampling random 
within each stratum.  
Given the mean $\bar{x}_i$ and variance ${\sigma^2_x}_i$ of an entity $x$
sorted with simple random sampling in each strata, such as:

$$
  \bar{x}_i = \frac{1}{n_i} \sum_j x_j
$$

$$
  \sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( x_j - \bar{x}_i \right)^2
  \thus
  {\sigma^2_x}_i = \frac{1}{n_i^2} \sum_j \sigma_i^2 = \frac{\sigma_i^2}{n_i}
$$

where:

  - $j$ runs over the points $x_j$ sampled in the $i^{\text{th}}$ stratum
  - $n_i$ is the number of points sorted in it
  - $\sigma_i^2$ is the variance associated with the $j^{\text{th}}$ point

then the mean $\bar{x}$ and variance $\sigma_x^2$ estimated with stratified
sampling for the whole population are:

$$
  \bar{x} = \frac{1}{N} \sum_i N_i \bar{x}_i \et
  \sigma_x^2 = \sum_i \left( \frac{N_i}{N} \right)^2 {\sigma_x}^2_i
             = \sum_i \left( \frac{N_i}{N} \right)^2 \frac{\sigma^2_i}{n_i}
$$

where $i$ runs over the strata, $N_i$ is the weight of the $i^{\text{th}}$
stratum and $N$ is the sum of all strata weights.

In practical terms, it can produce a weighted mean that has less variability
than the arithmetic mean of a simple random sample of the whole population. In
fact, if measurements within strata have lower standard deviation, the final
result will have a smaller error in estimation with respect to the one otherwise
obtained with simple sampling.  
For this reason, stratified sampling is used as a method of variance reduction
when MC methods are used to estimate population statistics from a known
population.


### MISER

The MISER technique aims to reduce the integration error through the use of
recursive stratified sampling.  
As stated before, according to the law of large numbers, for a large number of
extracted points, the estimation of the integral $I$ can be computed as:

$$
  I= V \cdot \langle f \rangle
$$


Since $V$ is known (in this case, $V = 1$), it is sufficient to estimate
$\langle f \rangle$.

Consider two disjoint regions $a$ and $b$, such that $a \cup b = \Omega$, in
which $n_a$ and $n_b$ points were uniformely sampled. Given the Monte Carlo
estimates of the means $\langle f \rangle_a$ and $\langle f \rangle_b$ of those
points and their variances $\sigma_a^2$ and $\sigma_b^2$, if the weights $N_a$
and $N_b$ of $\langle f \rangle_a$ and $\langle f \rangle_b$ are chosen unitary,
then the variance $\sigma^2$ of the combined estimate $\langle f \rangle$:

$$
  \langle f \rangle = \frac{1}{2} \left( \langle f \rangle_a
                    + \langle f \rangle_b \right)
$$

is given by:

$$
  \sigma^2 = \frac{\sigma_a^2}{4n_a} + \frac{\sigma_b^2}{4n_b}
$$

It can be shown that this variance is minimized by distributing the points such
that:

$$
  \frac{n_a}{n_a + n_b} = \frac{\sigma_a}{\sigma_a + \sigma_b}
$$

Hence, the smallest error estimate is obtained by allocating sample points in
proportion to the standard deviation of the function in each sub-region.  
The whole integral estimate and its variance are therefore given by:

$$
  I = V \cdot \langle f \rangle \et \sigma_I^2 = V^2 \cdot \sigma^2
$$

When implemented, MISER is in fact a recursive method. With a given step, all
the possible bisections are tested and the one which minimizes the combined
variance of the two sub-regions is selected. The variance in the sub-regions is
estimated with a fraction of the total number of available points. The remaining
sample points are allocated to the sub-regions using the formula for $n_a$ and
$n_b$, once the variances are computed.  
The same procedure is then repeated recursively for each of the two half-spaces
from the best bisection. At each recursion step, the integral and the error are
estimated using a plain Monte Carlo algorithm. After a given number of calls,
the final individual values and their error estimates are then combined upwards
to give an overall result and an estimate of its error.

Results for this particular sample are shown in @tbl:MISER.

-------------------------------------------------------------------------
                   500'000 calls     5'000'000 calls    50'000'000 calls
----------------- ----------------- ------------------ ------------------
$I^{\text{oss}}$     1.7182850738      1.7182819143       1.7182818221

$\sigma$             0.0000021829      0.0000001024       0.0000000049

diff                 0.0000032453      0.0000000858       000000000064
-------------------------------------------------------------------------

Table: MISER results with different numbers of function calls. Be careful:
       while in @tbl:MC the number of function calls stands for the number of
       total sampled poins, in this case it stands for the times each section
       is divided into subsections. {#tbl:MISER}

This time the error, altough it lies always in the same order of magnitude of
diff, seems to seesaw around the correct value, which is much more closer to
the expected one.


## Importance sampling

In statistics, importance sampling is a method which samples points from the
probability distribution $f$ itself, so that the points cluster in the regions
that make the largest contribution to the integral.

Remind that $I = V \cdot \langle f \rangle$ and therefore only $\langle f
\rangle$ must be estimated. Then, consider a sample of $n$ points {$x_i$}
generated according to a probability distribition function $P$ which gives
thereby the following expected value:

$$
  E [x, P] = \frac{1}{n} \sum_i x_i
$$

with variance:

$$
  \sigma^2 [E, P] = \frac{\sigma^2 [x, P]}{n}
  \with \sigma^2 [x, P] = \frac{1}{n -1} \sum_i \left( x_i - E [x, P] \right)^2
$$

where $i$ runs over the sample.  
In the case of plain MC, $\langle f \rangle$ is estimated as the expected
value of points {$f(x_i)$} sorted with $P (x_i) = 1 \quad \forall i$, since they
are evenly distributed in $\Omega$. The idea is to sample points from a
different distribution to lower the variance of $E[x, P]$, which results in
lowering $\sigma^2 [x, P]$. This is accomplished by choosing a random variable
$y$ and defining a new probability $P^{(y)}$ in order to satisfy:

$$
  E [x, P] = E \left[ \frac{x}{y}, P^{(y)} \right]
$$

which is to say:

$$
  I = \int \limits_{\Omega} dx f(x) = 
      \int \limits_{\Omega} dx \, \frac{f(x)}{g(x)} \, g(x)=
      \int \limits_{\Omega} dx \, w(x) \, g(x)
$$

where $E \, \longleftrightarrow \, I$ and:

$$
  \begin{cases}
    f(x) \, \longleftrightarrow \, x              \\
    1  \, \longleftrightarrow \, P
  \end{cases}
  \et
  \begin{cases}
    w(x) \, \longleftrightarrow \, \frac{x}{y}    \\
    g(x) \, \longleftrightarrow \, y = P^{(y)}
  \end{cases}
$$

Where the symbol $\longleftrightarrow$ points out the connection between the
variables. This new estimate is better than the former if:

$$
  \sigma^2 \left[ \frac{x}{y}, P^{(y)} \right] < \sigma^2 [x, P]
$$

The best variable $y$ would be:

$$
  y^{\star} = \frac{x}{E [x, P]} \, \longleftrightarrow \, \frac{f(x)}{I}
  \thus \frac{x}{y^{\star}} = E [x, P]
$$

and even a single sample under $P^{(y^{\star})}$ would be sufficient to give its
value. Obviously, it is not possible to take exactly this choice, since $E [x,
P]$ is not given a priori.  
However, this gives an insight into what importance sampling does. In fact,
given that:

$$
  E [x, P] = \int \limits_{a = - \infty}^{a = + \infty}
             a P(x \in [a, a + da])
$$

the best probability change $P^{(y^{\star})}$ redistributes the law of $x$ so
that its samples frequencies are sorted directly according to their weights in
$E[x, P]$, namely:

$$
  P^{(y^{\star})}(x \in [a, a + da]) = \frac{1}{E [x, P]} a P (x \in [a, a + da])
$$

In conclusion, since certain values of $x$ have more impact on $E [x, P]$ than
others, these "important" values must be emphasized by sampling them more
frequently. As a consequence, the estimator variance will be reduced.


### VEGAS


The VEGAS algorithm is based on importance sampling. It aims to reduce the
integration error by concentrating points in the regions that make the largest
contribution to the integral.

As stated before, in practice it is impossible to sample points from the best
distribution $P^{(y^{\star})}$: only a good approximation can be achieved. In
GSL, the VEGAS algorithm approximates the distribution by histogramming the
function $f$ in different subregions. Each histogram is used to define a
sampling distribution for the next pass, which consists in doing the same thing
recorsively: this procedure converges asymptotically to the desired
distribution. It follows that a better estimation is achieved with a greater
number of function calls.  
The integration uses a fixed number of function calls. The result and its
error estimate are based on a weighted average of independent samples, as for
MISER.  
For this particular sample, results are shown in @tbl:VEGAS.

-------------------------------------------------------------------------
                   500'000 calls     5'000'000 calls    50'000'000 calls
----------------- ----------------- ------------------ ------------------
$I^{\text{oss}}$     1.7182818354      1.7182818289       1.7182818285

$\sigma$             0.0000000137      0.0000000004       0.0000000000

diff                 0.0000000069      0.0000000004       0.0000000000
-------------------------------------------------------------------------

Table: VEGAS results with different numbers of
       function calls. {#tbl:VEGAS}

This time, the error estimation is notably close to diff for each number of
function calls, meaning that the estimation of both the integral and its
error turn out to be very accurate, much more than the ones obtained with
both plain Monte Carlo method and stratified sampling.
initial commit 2020-03-06 02:24:32 +01:00			`# Exercize 5`

ex-5: went on writing 2020-03-12 20:42:26 +01:00			`The following integral must be evaluated:`
initial commit 2020-03-06 02:24:32 +01:00
			`$$`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`I = \int\limits_0^1 dx \, e^x`
initial commit 2020-03-06 02:24:32 +01:00			`$$`

ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`\begin{figure}`
			`\hypertarget{fig:exp}{%`
			`\centering`
			`\begin{tikzpicture}`
			`\definecolor{cyclamen}{RGB}{146, 24, 43}`
			`% Integral`
			`\filldraw [cyclamen!15!white, domain=0:5, variable=\x]`
			`(0,0) -- plot({\x},{exp(\x/5)}) -- (5,0) -- cycle;`
			`\draw [cyclamen] (5,0) -- (5,2.7182818);`
			`\node [below] at (5,0) {1};`
			`% Axis`
			`\draw [thick, <-] (0,4) -- (0,0);`
			`\draw [thick, ->] (-2,0) -- (7,0);`
			`\node [below right] at (7,0) {$x$};`
			`\node [above left] at (0,4) {$e^{x}$};`
			`% Plot`
			`\draw [domain=-2:7, smooth, variable=\x,`
			`cyclamen, ultra thick] plot ({\x},{exp(\x/5)});`
			`\end{tikzpicture}`
			`\caption{Plot of the integral to be evaluated.}`
			`}`
			`\end{figure}`

			`whose exact value is 1.7182818285...`

			`The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser`
			`and Vegas. Besides this popularity fact, these three method were chosen for`
			`being the only ones implemented in the GSL library.`

initial commit 2020-03-06 02:24:32 +01:00
			`## Plain Monte Carlo`

ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`When the integral $I$ over a $n-$dimensional space $\Omega$ of volume $V$ of a`
			`function $f$ must be evaluated, that is:`

			`$$`
			`I = \int\limits_{\Omega} dx \, f(x)`
			`\with V = \int\limits_{\Omega} dx`
			`$$`

			`the simplest MC method approach is to sample $N$ points $x_i$ evenly distributed`
			`in $V$ and approx $I$ as:`
initial commit 2020-03-06 02:24:32 +01:00
			`$$`
			`I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle`
			`$$`

			`with $I_N \rightarrow I$ for $N \rightarrow + \infty$ for the law of large`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`numbers. Hence, the sample variance can be extimated by the sample variance:`
initial commit 2020-03-06 02:24:32 +01:00
			`$$`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`\sigma^2_f = \frac{1}{N - 1} \sum_{i = 1}^N \left( f(x_i) - \langle f`
			`\rangle \right)^2 \et \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N`
			`\sigma^2_f = \frac{V^2}{N} \sigma^2_f`
initial commit 2020-03-06 02:24:32 +01:00			`$$`

ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`Thus, the error decreases as $1/\sqrt{N}$.`
			`Unlike in deterministic methods, the estimate of the error is not a strict error`
			`bound: random sampling may not uncover all the important features of the`
			`integrand and this can result in an underestimate of the error.`

			`In this case, $f(x) = e^{x}$ and $\Omega = [0,1]$.`

initial commit 2020-03-06 02:24:32 +01:00			`Since the proximity of $I_N$ to $I$ is related to $N$, the accuracy of the`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`method is determined by how many points are generated, namely how many function`
			`calls are exectuted when the method is implemented. In @tbl:MC, the obtained`
			`results and errors $\sigma$ are shown. The estimated integrals for different`
			`numbers of calls are compared to the expected value $I$ and the difference`
			`'diff' between them is given.`
			`As can be seen, the MC method tends to underestimate the error for scarse`
			`function calls. As previously stated, the higher the number of function calls,`
			`the better the estimation of $I$. A further observation regards the fact that,`
			`even with $50'000'000$ calls, the $I^{\text{oss}}$ still differs from $I$ at`
			`the fifth decimal digit.`

			`-------------------------------------------------------------------------`
			`500'000 calls 5'000'000 calls 50'000'000 calls`
			`----------------- ----------------- ------------------ ------------------`
			`$I^{\text{oss}}$ 1.7166435813 1.7181231109 1.7183387184`

			`$\sigma$ 0.0006955691 0.0002200309 0.0000695809`
initial commit 2020-03-06 02:24:32 +01:00
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`diff 0.0016382472 0.0001587176 0.0000568899`
			`-------------------------------------------------------------------------`
initial commit 2020-03-06 02:24:32 +01:00
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`Table: MC results with different numbers of function calls. {#tbl:MC}`
initial commit 2020-03-06 02:24:32 +01:00

ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`## Stratified sampling`
initial commit 2020-03-06 02:24:32 +01:00
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`In statistics, stratified sampling is a method of sampling from a population`
			`partitioned into subpopulations. Stratification, indeed, is the process of`
			`dividing the primary sample into subgroups (strata) before sampling random`
			`within each stratum.`
			`Given the mean $\bar{x}_i$ and variance ${\sigma^2_x}_i$ of an entity $x$`
			`sorted with simple random sampling in each strata, such as:`

			`$$`
			`\bar{x}_i = \frac{1}{n_i} \sum_j x_j`
			`$$`
initial commit 2020-03-06 02:24:32 +01:00
			`$$`
ex-5: complete importance sampling 2020-03-15 21:42:41 +01:00			`\sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( x_j - \bar{x}_i \right)^2`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`\thus`
			`{\sigma^2_x}_i = \frac{1}{n_i^2} \sum_j \sigma_i^2 = \frac{\sigma_i^2}{n_i}`
initial commit 2020-03-06 02:24:32 +01:00			`$$`

ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`where:`

			`- $j$ runs over the points $x_j$ sampled in the $i^{\text{th}}$ stratum`
			`- $n_i$ is the number of points sorted in it`
			`- $\sigma_i^2$ is the variance associated with the $j^{\text{th}}$ point`

			`then the mean $\bar{x}$ and variance $\sigma_x^2$ estimated with stratified`
			`sampling for the whole population are:`
initial commit 2020-03-06 02:24:32 +01:00
			`$$`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`\bar{x} = \frac{1}{N} \sum_i N_i \bar{x}_i \et`
			`\sigma_x^2 = \sum_i \left( \frac{N_i}{N} \right)^2 {\sigma_x}^2_i`
			`= \sum_i \left( \frac{N_i}{N} \right)^2 \frac{\sigma^2_i}{n_i}`
initial commit 2020-03-06 02:24:32 +01:00			`$$`

ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`where $i$ runs over the strata, $N_i$ is the weight of the $i^{\text{th}}$`
			`stratum and $N$ is the sum of all strata weights.`

			`In practical terms, it can produce a weighted mean that has less variability`
			`than the arithmetic mean of a simple random sample of the whole population. In`
			`fact, if measurements within strata have lower standard deviation, the final`
			`result will have a smaller error in estimation with respect to the one otherwise`
			`obtained with simple sampling.`
			`For this reason, stratified sampling is used as a method of variance reduction`
			`when MC methods are used to estimate population statistics from a known`
			`population.`

ex-5: went on writing 2020-03-12 20:42:26 +01:00
			`### MISER`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00
			`The MISER technique aims to reduce the integration error through the use of`
			`recursive stratified sampling.`
ex-5: complete Miser section 2020-03-10 22:59:25 +01:00			`As stated before, according to the law of large numbers, for a large number of`
			`extracted points, the estimation of the integral $I$ can be computed as:`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00
ex-5: complete Miser section 2020-03-10 22:59:25 +01:00			`$$`
			`I= V \cdot \langle f \rangle`
			`$$`


			`Since $V$ is known (in this case, $V = 1$), it is sufficient to estimate`
			`$\langle f \rangle$.`

			`Consider two disjoint regions $a$ and $b$, such that $a \cup b = \Omega$, in`
			`which $n_a$ and $n_b$ points were uniformely sampled. Given the Monte Carlo`
			`estimates of the means $\langle f \rangle_a$ and $\langle f \rangle_b$ of those`
			`points and their variances $\sigma_a^2$ and $\sigma_b^2$, if the weights $N_a$`
			`and $N_b$ of $\langle f \rangle_a$ and $\langle f \rangle_b$ are chosen unitary,`
			`then the variance $\sigma^2$ of the combined estimate $\langle f \rangle$:`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00
			`$$`
ex-5: complete Miser section 2020-03-10 22:59:25 +01:00			`\langle f \rangle = \frac{1}{2} \left( \langle f \rangle_a`
			`+ \langle f \rangle_b \right)`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`$$`

			`is given by:`

			`$$`
ex-5: complete Miser section 2020-03-10 22:59:25 +01:00			`\sigma^2 = \frac{\sigma_a^2}{4n_a} + \frac{\sigma_b^2}{4n_b}`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`$$`

			`It can be shown that this variance is minimized by distributing the points such`
			`that:`
initial commit 2020-03-06 02:24:32 +01:00
			`$$`
ex-5: complete Miser section 2020-03-10 22:59:25 +01:00			`\frac{n_a}{n_a + n_b} = \frac{\sigma_a}{\sigma_a + \sigma_b}`
initial commit 2020-03-06 02:24:32 +01:00			`$$`

ex-5: continue documentation work 2020-03-09 23:39:14 +01:00			`Hence, the smallest error estimate is obtained by allocating sample points in`
ex-5: complete Miser section 2020-03-10 22:59:25 +01:00			`proportion to the standard deviation of the function in each sub-region.`
			`The whole integral estimate and its variance are therefore given by:`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00
ex-5: complete Miser section 2020-03-10 22:59:25 +01:00			`$$`
			`I = V \cdot \langle f \rangle \et \sigma_I^2 = V^2 \cdot \sigma^2`
			`$$`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00
			`When implemented, MISER is in fact a recursive method. With a given step, all`
			`the possible bisections are tested and the one which minimizes the combined`
ex-5: complete Miser section 2020-03-10 22:59:25 +01:00			`variance of the two sub-regions is selected. The variance in the sub-regions is`
			`estimated with a fraction of the total number of available points. The remaining`
			`sample points are allocated to the sub-regions using the formula for $n_a$ and`
			`$n_b$, once the variances are computed.`
			`The same procedure is then repeated recursively for each of the two half-spaces`
			`from the best bisection. At each recursion step, the integral and the error are`
			`estimated using a plain Monte Carlo algorithm. After a given number of calls,`
			`the final individual values and their error estimates are then combined upwards`
			`to give an overall result and an estimate of its error.`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00
			`Results for this particular sample are shown in @tbl:MISER.`

			`-------------------------------------------------------------------------`
			`500'000 calls 5'000'000 calls 50'000'000 calls`
			`----------------- ----------------- ------------------ ------------------`
			`$I^{\text{oss}}$ 1.7182850738 1.7182819143 1.7182818221`

			`$\sigma$ 0.0000021829 0.0000001024 0.0000000049`

			`diff 0.0000032453 0.0000000858 000000000064`
			`-------------------------------------------------------------------------`

ex-5: complete Miser section 2020-03-10 22:59:25 +01:00			`Table: MISER results with different numbers of function calls. Be careful:`
			`while in @tbl:MC the number of function calls stands for the number of`
			`total sampled poins, in this case it stands for the times each section`
			`is divided into subsections. {#tbl:MISER}`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00
ex-5: complete Miser section 2020-03-10 22:59:25 +01:00			`This time the error, altough it lies always in the same order of magnitude of`
ex-5: complete writing 2020-03-15 22:48:56 +01:00			`diff, seems to seesaw around the correct value, which is much more closer to`
			`the expected one.`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00

ex-5: went on writing 2020-03-12 20:42:26 +01:00			`## Importance sampling`

ex-5: complete importance sampling 2020-03-15 21:42:41 +01:00			`In statistics, importance sampling is a method which samples points from the`
			`probability distribution $f$ itself, so that the points cluster in the regions`
			`that make the largest contribution to the integral.`

			`Remind that $I = V \cdot \langle f \rangle$ and therefore only $\langle f`
			`\rangle$ must be estimated. Then, consider a sample of $n$ points {$x_i$}`
			`generated according to a probability distribition function $P$ which gives`
			`thereby the following expected value:`
ex-5: went on writing 2020-03-12 20:42:26 +01:00
			`$$`
			`E [x, P] = \frac{1}{n} \sum_i x_i`
			`$$`

			`with variance:`

			`$$`
ex-5: complete importance sampling 2020-03-15 21:42:41 +01:00			`\sigma^2 [E, P] = \frac{\sigma^2 [x, P]}{n}`
			`\with \sigma^2 [x, P] = \frac{1}{n -1} \sum_i \left( x_i - E [x, P] \right)^2`
ex-5: went on writing 2020-03-12 20:42:26 +01:00			`$$`

ex-5: complete importance sampling 2020-03-15 21:42:41 +01:00			`where $i$ runs over the sample.`
			`In the case of plain MC, $\langle f \rangle$ is estimated as the expected`
			`value of points {$f(x_i)$} sorted with $P (x_i) = 1 \quad \forall i$, since they`
			`are evenly distributed in $\Omega$. The idea is to sample points from a`
			`different distribution to lower the variance of $E[x, P]$, which results in`
			`lowering $\sigma^2 [x, P]$. This is accomplished by choosing a random variable`
			`$y$ and defining a new probability $P^{(y)}$ in order to satisfy:`
ex-5: went on writing 2020-03-12 20:42:26 +01:00
			`$$`
			`E [x, P] = E \left[ \frac{x}{y}, P^{(y)} \right]`
			`$$`

ex-5: complete importance sampling 2020-03-15 21:42:41 +01:00			`which is to say:`
ex-5: went on writing 2020-03-12 20:42:26 +01:00
			`$$`
ex-5: complete importance sampling 2020-03-15 21:42:41 +01:00			`I = \int \limits_{\Omega} dx f(x) =`
			`\int \limits_{\Omega} dx \, \frac{f(x)}{g(x)} \, g(x)=`
			`\int \limits_{\Omega} dx \, w(x) \, g(x)`
ex-5: went on writing 2020-03-12 20:42:26 +01:00			`$$`

ex-5: complete importance sampling 2020-03-15 21:42:41 +01:00			`where $E \, \longleftrightarrow \, I$ and:`
ex-5: went on writing 2020-03-12 20:42:26 +01:00
			`$$`
ex-5: complete importance sampling 2020-03-15 21:42:41 +01:00			`\begin{cases}`
			`f(x) \, \longleftrightarrow \, x \\`
			`1 \, \longleftrightarrow \, P`
			`\end{cases}`
			`\et`
			`\begin{cases}`
			`w(x) \, \longleftrightarrow \, \frac{x}{y} \\`
			`g(x) \, \longleftrightarrow \, y = P^{(y)}`
			`\end{cases}`
ex-5: went on writing 2020-03-12 20:42:26 +01:00			`$$`

ex-5: complete importance sampling 2020-03-15 21:42:41 +01:00			`Where the symbol $\longleftrightarrow$ points out the connection between the`
			`variables. This new estimate is better than the former if:`
ex-5: went on writing 2020-03-12 20:42:26 +01:00
ex-5: complete importance sampling 2020-03-15 21:42:41 +01:00			`$$`
			`\sigma^2 \left[ \frac{x}{y}, P^{(y)} \right] < \sigma^2 [x, P]`
			`$$`
ex-5: went on writing 2020-03-12 20:42:26 +01:00
ex-5: complete importance sampling 2020-03-15 21:42:41 +01:00			`The best variable $y$ would be:`
ex-5: went on writing 2020-03-12 20:42:26 +01:00
ex-5: complete importance sampling 2020-03-15 21:42:41 +01:00			`$$`
			`y^{\star} = \frac{x}{E [x, P]} \, \longleftrightarrow \, \frac{f(x)}{I}`
			`\thus \frac{x}{y^{\star}} = E [x, P]`
			`$$`
ex-5: went on writing 2020-03-12 20:42:26 +01:00
ex-5: complete importance sampling 2020-03-15 21:42:41 +01:00			`and even a single sample under $P^{(y^{\star})}$ would be sufficient to give its`
			`value. Obviously, it is not possible to take exactly this choice, since $E [x,`
			`P]$ is not given a priori.`
			`However, this gives an insight into what importance sampling does. In fact,`
			`given that:`
ex-5: went on writing 2020-03-12 20:42:26 +01:00
ex-5: complete importance sampling 2020-03-15 21:42:41 +01:00			`$$`
			`E [x, P] = \int \limits_{a = - \infty}^{a = + \infty}`
			`a P(x \in [a, a + da])`
			`$$`
ex-5: went on writing 2020-03-12 20:42:26 +01:00
ex-5: complete importance sampling 2020-03-15 21:42:41 +01:00			`the best probability change $P^{(y^{\star})}$ redistributes the law of $x$ so`
			`that its samples frequencies are sorted directly according to their weights in`
			`$E[x, P]$, namely:`
ex-5: went on writing 2020-03-12 20:42:26 +01:00
			`$$`
ex-5: complete importance sampling 2020-03-15 21:42:41 +01:00			`P^{(y^{\star})}(x \in [a, a + da]) = \frac{1}{E [x, P]} a P (x \in [a, a + da])`
ex-5: went on writing 2020-03-12 20:42:26 +01:00			`$$`

ex-5: complete writing 2020-03-15 22:48:56 +01:00			`In conclusion, since certain values of $x$ have more impact on $E [x, P]$ than`
			`others, these "important" values must be emphasized by sampling them more`
			`frequently. As a consequence, the estimator variance will be reduced.`
ex-5: went on writing 2020-03-12 20:42:26 +01:00

ex-5: complete writing 2020-03-15 22:48:56 +01:00			`### VEGAS`
ex-5: went on writing 2020-03-12 20:42:26 +01:00
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00
ex-5: complete writing 2020-03-15 22:48:56 +01:00			`The VEGAS algorithm is based on importance sampling. It aims to reduce the`
			`integration error by concentrating points in the regions that make the largest`
			`contribution to the integral.`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00
ex-5: complete writing 2020-03-15 22:48:56 +01:00			`As stated before, in practice it is impossible to sample points from the best`
			`distribution $P^{(y^{\star})}$: only a good approximation can be achieved. In`
			`GSL, the VEGAS algorithm approximates the distribution by histogramming the`
			`function $f$ in different subregions. Each histogram is used to define a`
			`sampling distribution for the next pass, which consists in doing the same thing`
			`recorsively: this procedure converges asymptotically to the desired`
			`distribution. It follows that a better estimation is achieved with a greater`
			`number of function calls.`
			`The integration uses a fixed number of function calls. The result and its`
			`error estimate are based on a weighted average of independent samples, as for`
			`MISER.`
			`For this particular sample, results are shown in @tbl:VEGAS.`
ex-5: continue documentation work 2020-03-09 23:39:14 +01:00
ex-5: complete writing 2020-03-15 22:48:56 +01:00			`-------------------------------------------------------------------------`
			`500'000 calls 5'000'000 calls 50'000'000 calls`
			`----------------- ----------------- ------------------ ------------------`
			`$I^{\text{oss}}$ 1.7182818354 1.7182818289 1.7182818285`
initial commit 2020-03-06 02:24:32 +01:00
ex-5: complete writing 2020-03-15 22:48:56 +01:00			`$\sigma$ 0.0000000137 0.0000000004 0.0000000000`
initial commit 2020-03-06 02:24:32 +01:00
ex-5: complete writing 2020-03-15 22:48:56 +01:00			`diff 0.0000000069 0.0000000004 0.0000000000`
			`-------------------------------------------------------------------------`
initial commit 2020-03-06 02:24:32 +01:00
ex-5: complete writing 2020-03-15 22:48:56 +01:00			`Table: VEGAS results with different numbers of`
			`function calls. {#tbl:VEGAS}`
initial commit 2020-03-06 02:24:32 +01:00
ex-5: complete writing 2020-03-15 22:48:56 +01:00			`This time, the error estimation is notably close to diff for each number of`
			`function calls, meaning that the estimation of both the integral and its`
			`error turn out to be very accurate, much more than the ones obtained with`
			`both plain Monte Carlo method and stratified sampling.`