ex-5: revised and typo-fixed

Some images and an article have also been added to the folder.
The todo has been updated too as well as the readme.
This commit is contained in:
Giù Marcer 2020-05-19 16:01:52 +02:00 committed by rnhmjoj
parent 9303f4f379
commit bdc12117a7
10 changed files with 276 additions and 158 deletions

View File

@ -42,15 +42,15 @@ See this [guide](https://nixos.org/nix/manual/#chap-quick-start) if you have
never used Nix before. Running `nix-shell` in the top-level will drop you into
the development shell.
Once ready, invoke `make` with the program you wish to build. For example
Once ready, invoke `make` with the program you wish to build. For example:
$ make ex-1/bin/main
or, to build every program of an exercise
or, to build every program of an exercise:
$ make ex-1
To clean up the build results run
To clean up the build results run:
$ make clean
@ -82,9 +82,9 @@ The output can be redirected to `ex-1/pdf-plot.py` to generate a plot.
Every program in `ex-2` computes the best available approximation (with a given
method) to the Euler-Mascheroni γ constant and prints[1]:
1. the leading decimal digits of the approximate value found
1. the leading decimal digits of the approximate value found;
2. the exact decimal digits of γ
2. the exact decimal digits of γ;
3. the absolute difference between the 1. and 2.
@ -104,7 +104,7 @@ cases the best fit and the parameter covariance matrix are printed.
The program then performs a t-test to assert the compatibility of the data with
two hypothesis and print the results in a table.
To plot a 2D histogram of the generated sample do
To plot a 2D histogram of the generated sample do:
$ ex-3/bin/main -i | ex-3/plot.py
@ -123,7 +123,7 @@ horizontal component. It is possible to set the maximum momentum with the
option `-p`. A χ² fit and a t-test compatibility are performed with respect
to the expected distribution and results are printed.
To plot a histogram of the generated sample do
To plot a histogram of the generated sample do:
$ ex-4/bin/main -o | ex-4/plot.py
@ -133,11 +133,30 @@ and `-b`.
### Exercise 5
`ex-5/main` compute the integral of exp(1) between 0 and 1 with the methods
plain MC, MISER and VEGAS. Being reiterative routines, it takes the number of
iterations as its only argument.
It prints out the obatined value and its estimated error for each method.
`ex-5/main` compute estimations of the integral of exp(1) between 0 and 1
with the methods plain MC, MISER and VEGAS with different number of points.
It takes no arguments in input.
It prints out eight columns:
1. the number of sorted points;
2. the integral estimation with plain MC;
3. its uncertainty;
4. the integral estimation with MISER;
5. its uncertainty;
6. the integral estimation with VEGAS;
7. its uncertainty;
8. its χ².
To plot the results do:
$ ex-5/main | ex-5/plot.py
### Exercise 6
@ -167,8 +186,8 @@ classified data in this order: signal then noise.
To plot the result of the linear classification pipe the output to
`ex-7/plot.py`. The program generates two figures:
- a scatter plot showing the Fisher projection line and the cut line
- two histograms of the projected data and the cut line
- a scatter plot showing the Fisher projection line and the cut line;
- two histograms of the projected data and the cut line.
`ex-7/bin/test` takes a model trained in `ex-7/bin/main` and test it against
newly generated datasets (`-i` to set the number of test iterations). The

View File

@ -15,14 +15,18 @@ double function (double * x, size_t dim, void * params)
// Function which prints out the results when called.
//
void results (size_t calls, double result, double error)
void results (size_t calls, double result, double error, double chi)
{
if (calls != 0)
{
printf ("%ld\t", calls);
printf ("%.0e\t", (double)calls);
}
printf ("%.10f\t", result);
printf ("%.10f\t", error);
if (chi != 0)
{
printf ("%.3f\t", chi);
}
}
@ -33,7 +37,7 @@ int main(int argc, char** argv)
size_t dims = 1; // Integral dimension
double lower[1] = {0}; // Integration lower limit
double upper[1] = {1}; // Integration upper limit
size_t calls = 5000; // Initial number of function calls
size_t calls = 50; // Initial number of function calls
double integral, error;
// Initialize an RNG.
@ -65,26 +69,37 @@ int main(int argc, char** argv)
gsl_monte_plain_integrate (&expo, lower, upper, dims, calls,
r, sMC, &integral, &error);
gsl_monte_plain_free(sMC);
results(calls, integral, error);
results(calls, integral, error, 0);
// MISER
gsl_monte_miser_state *sMI = gsl_monte_miser_alloc (dims);
gsl_monte_miser_integrate (&expo, lower, upper, dims, calls,
r, sMI, &integral, &error);
gsl_monte_miser_free(sMI);
results(0, integral, error);
results(0, integral, error, 0);
// VEGAS
gsl_monte_vegas_state *sVE = gsl_monte_vegas_alloc (dims);
gsl_monte_vegas_integrate (&expo, lower, upper, dims, calls,
r, sVE, &integral, &error);
// In order to see the parameters of the VEGAS integration, decomment this
// paragraph:
//
// gsl_monte_vegas_params params;
// gsl_monte_vegas_params_get(sVE, &params);
// params.verbose = 1;
// params.ostream = stderr;
// gsl_monte_vegas_params_set(sVE, &params);
do
{
gsl_monte_vegas_integrate (&expo, lower, upper, dims,
calls/5, r, sVE, &integral, &error);
calls, r, sVE, &integral, &error);
} while (fabs (gsl_monte_vegas_chisq (sVE) - 1.0) > 0.5);
gsl_monte_vegas_free(sVE);
results(0, integral, error);
double chi = sVE->chisq;
results(0, integral, error, chi);
// Update function calls
printf ("\n");

4
ex-5/plot.py Normal file → Executable file
View File

@ -1,8 +1,10 @@
#!/usr/bin/env python
import matplotlib.pyplot as plt
import numpy as np
import sys
calls, I_MC, σ_MC, I_MI, σ_MI, I_VE, σ_VE = np.loadtxt(sys.stdin, unpack=True)
calls, I_MC, σ_MC, I_MI, σ_MI, I_VE, σ_VE, chi = np.loadtxt(sys.stdin, unpack=True)
exact = 1.7182818285
plt.rcParams['font.size'] = 17

View File

@ -106,3 +106,35 @@
year={2005},
journal={Matrix}
}
@article{sayah19,
title={Adaptive stratified Monte Carlo algorithm for numerical computation
of integrals},
author={Sayah, Toni},
journal={Mathematics and Computers in Simulation},
volume={157},
pages={143--158},
year={2019},
publisher={Elsevier}
}
@article{ridder17,
title={Variance reduction},
author={Ridder, AAN and Botev, ZI},
journal={Wiley StatsRef: Statistics Reference Online},
pages={1--6},
year={2017},
publisher={Wiley}
}
@article{lepage78,
title={A new algorithm for adaptive multidimensional integration},
author={Lepage, G Peter},
journal={Journal of Computational Physics},
volume={27},
number={2},
pages={192--203},
year={1978},
publisher={Elsevier}
}

View File

@ -209,7 +209,8 @@ $$
P^{\text{oss}}_{\text{max}} = 10.005 \pm 0.018 \with \chi_r^2 = 0.071
$$
where $\chi_r^2$ is the reduced $\chi^2$, proving a good convergence.
where $\chi_r^2$ is the $\chi^2$ per degree of freedom, proving a good
convergence.
In order to compare $P^{\text{oss}}_{\text{max}}$ with the expected value
$P_{\text{max}} = 10$, the following compatibility t-test was applied:

View File

@ -1,10 +1,7 @@
# Exercize 5
# Exercise 5
The following integral must be evaluated:
$$
I = \int\limits_0^1 dx \, e^x
$$
The following integral must be evaluated comparing different Monte Carlo
techniques.
\begin{figure}
\hypertarget{fig:exp}{%
@ -24,6 +21,8 @@ $$
% Plot
\draw [domain=-2:7, smooth, variable=\x,
cyclamen, ultra thick] plot ({\x},{exp(\x/5)});
% Equation
\node [above] at (2.5, 2.5) {$I = \int\limits_0^1 dx \, e^x$};
\end{tikzpicture}
\caption{Plot of the integral to be evaluated.}
}
@ -33,14 +32,14 @@ whose exact value is 1.7182818285...
The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser
and Vegas. Besides this popularity fact, these three method were chosen for
being the only ones implemented in the GSL library.
being implemented in the GSL libraries `gsl_monte_plain`, `gsl_monte_miser` and
`gsl_monte_vegas` respectively.
## Plain Monte Carlo
When the integral $I$ over a $n-$dimensional space $\Omega$ of volume $V$ of a
When an integral $I$ over a $n-$dimensional space $\Omega$ of volume $V$ of a
function $f$ must be evaluated, that is:
$$
I = \int\limits_{\Omega} dx \, f(x)
\with V = \int\limits_{\Omega} dx
@ -48,14 +47,12 @@ $$
the simplest MC method approach is to sample $N$ points $x_i$ evenly distributed
in $V$ and approx $I$ as:
$$
I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle
$$
with $I_N \rightarrow I$ for $N \rightarrow + \infty$ for the law of large
numbers. Hence, the sample variance can be extimated by the sample variance:
with $I_N \rightarrow I$ for $N \rightarrow + \infty$ because of the law of
large numbers, whereas the sample variance can be estimated as:
$$
\sigma^2_f = \frac{1}{N - 1} \sum_{i = 1}^N \left( f(x_i) - \langle f
\rangle \right)^2 \et \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N
@ -67,31 +64,55 @@ Unlike in deterministic methods, the estimate of the error is not a strict error
bound: random sampling may not uncover all the important features of the
integrand and this can result in an underestimate of the error.
In this case, $f(x) = e^{x}$ and $\Omega = [0,1]$.
In this case $f(x) = e^{x}$ and $\Omega = [0,1]$, hence $V = 1$.
Since the proximity of $I_N$ to $I$ is related to $N$, the accuracy of the
method is determined by how many points are generated, namely how many function
calls are exectuted when the method is implemented. In @tbl:MC, the obtained
results and errors $\sigma$ are shown. The estimated integrals for different
numbers of calls are compared to the expected value $I$ and the difference
'diff' between them is given.
As can be seen, the MC method tends to underestimate the error for scarse
function calls. As previously stated, the higher the number of function calls,
the better the estimation of $I$. A further observation regards the fact that,
even with $50'000'000$ calls, the $I^{\text{oss}}$ still differs from $I$ at
the fifth decimal digit.
method lies in how many points are generated, namely how many function calls
are executed when the reiterative method is implemented. In @fig:MC and
@fig:MI, results obtained with the plain MC method are shown in red. In
@tbl:MC, some of them are listed: the estimated integrals $I^{\text{oss}}$ are
compared to the expected value $I$ and the differences diff between them are
given.
-------------------------------------------------------------------------
500'000 calls 5'000'000 calls 50'000'000 calls
----------------- ----------------- ------------------ ------------------
$I^{\text{oss}}$ 1.7166435813 1.7181231109 1.7183387184
![Estimated values of $I$ obatined by Plain MC technique with different
number of function calls; logarithmic scale; errorbars showing their
estimated uncertainties. As can be seen, the process does a sort o seesaw
around the correct value.](images/MC_MC.pdf){#fig:MC}
$\sigma$ 0.0006955691 0.0002200309 0.0000695809
---------------------------------------------------------------------------
calls $I^{\text{oss}}$ $\sigma$ diff
------------------ ------------------ ------------------ ------------------
\SI{5e5}{} 1.7166435813 0.0006955691 0.0016382472
diff 0.0016382472 0.0001587176 0.0000568899
-------------------------------------------------------------------------
\SI{5e6}{} 1.7181231109 0.0002200309 0.0001587176
Table: MC results with different numbers of function calls. {#tbl:MC}
\SI{5e7}{} 1.7183387184 0.0000695809 0.0000568899
---------------------------------------------------------------------------
Table: Some MC results with three different numbers of function calls.
Differences between computed and exact values are given in
diff. {#tbl:MC}
As can be seen, $\sigma$ is always of the same order of magnitude of diff,
except for very low numbers of function calls. Even with \SI{5e7}{} calls,
$I^{\text{oss}}$ still differs from $I$ at the fifth decimal digit, meaning that
this method shows a really slow convergence. In fact, since the $\sigma$s
dependence on the number $C$ of function calls is confirmed:
$$
\begin{cases}
\sigma_1 = \SI{6.95569e-4}{} \longleftrightarrow C_1 = \SI{5e6}{} \\
\sigma_2 = \SI{6.95809e-5}{} \longleftrightarrow C_1 = \SI{5e8}{}
\end{cases}
$$
$$
\thus
\frac{\sigma_1}{\sigma_2} = 9.9965508 \sim 10 = \sqrt{\frac{C_2}{C_1}}
$$
if an error of $\sim 1^{-n}$ is required, a number $\propto 10^{2n}$ of
function calls must be executed, meaning that for $\sigma \sim 1^{-10}
\rightarrow C = \SI{1e20}{}$, which would be impractical.
## Stratified sampling
@ -101,12 +122,12 @@ partitioned into subpopulations. Stratification, indeed, is the process of
dividing the primary sample into subgroups (strata) before sampling random
within each stratum.
Given the mean $\bar{x}_i$ and variance ${\sigma^2_x}_i$ of an entity $x$
sorted with simple random sampling in each strata, such as:
sorted with simple random sampling in the $i^{\text{th}}$ strata, such as:
$$
\bar{x}_i = \frac{1}{n_i} \sum_j x_j
$$
and:
$$
\sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( x_j - \bar{x}_i \right)^2
\thus
@ -115,13 +136,12 @@ $$
where:
- $j$ runs over the points $x_j$ sampled in the $i^{\text{th}}$ stratum
- $n_i$ is the number of points sorted in it
- $\sigma_i^2$ is the variance associated with the $j^{\text{th}}$ point
- $j$ runs over the points $x_j$ sampled in the $i^{\text{th}}$ stratum,
- $n_i$ is the number of points sorted in it,
- $\sigma_i^2$ is the variance associated with the $j^{\text{th}}$ point.
then the mean $\bar{x}$ and variance $\sigma_x^2$ estimated with stratified
then, the mean $\bar{x}$ and variance $\sigma_x^2$ estimated with stratified
sampling for the whole population are:
$$
\bar{x} = \frac{1}{N} \sum_i N_i \bar{x}_i \et
\sigma_x^2 = \sum_i \left( \frac{N_i}{N} \right)^2 {\sigma_x}^2_i
@ -138,7 +158,7 @@ result will have a smaller error in estimation with respect to the one otherwise
obtained with simple sampling.
For this reason, stratified sampling is used as a method of variance reduction
when MC methods are used to estimate population statistics from a known
population.
population [@ridder17].
### MISER
@ -147,36 +167,32 @@ The MISER technique aims to reduce the integration error through the use of
recursive stratified sampling.
As stated before, according to the law of large numbers, for a large number of
extracted points, the estimation of the integral $I$ can be computed as:
$$
I= V \cdot \langle f \rangle
$$
Since $V$ is known (in this case, $V = 1$), it is sufficient to estimate
$\langle f \rangle$.
Consider two disjoint regions $a$ and $b$, such that $a \cup b = \Omega$, in
which $n_a$ and $n_b$ points were uniformely sampled. Given the Monte Carlo
estimates of the means $\langle f \rangle_a$ and $\langle f \rangle_b$ of those
points and their variances $\sigma_a^2$ and $\sigma_b^2$, if the weights $N_a$
and $N_b$ of $\langle f \rangle_a$ and $\langle f \rangle_b$ are chosen unitary,
then the variance $\sigma^2$ of the combined estimate $\langle f \rangle$:
which $n_a$ and $n_b$ points are respectively uniformly sampled. Given the
Monte Carlo estimates of the means $\langle f \rangle_a$ and $\langle f
\rangle_b$ of those points and their variances $\sigma_a^2$ and $\sigma_b^2$, if
the weights $N_a$ and $N_b$ of $\langle f \rangle_a$ and $\langle f \rangle_b$
are chosen unitary, then the variance $\sigma^2$ of the combined estimate
$\langle f \rangle$:
$$
\langle f \rangle = \frac{1}{2} \left( \langle f \rangle_a
+ \langle f \rangle_b \right)
$$
is given by:
$$
\sigma^2 = \frac{\sigma_a^2}{4n_a} + \frac{\sigma_b^2}{4n_b}
$$
It can be shown that this variance is minimized by distributing the points such
that:
$$
\frac{n_a}{n_a + n_b} = \frac{\sigma_a}{\sigma_a + \sigma_b}
$$
@ -184,43 +200,48 @@ $$
Hence, the smallest error estimate is obtained by allocating sample points in
proportion to the standard deviation of the function in each sub-region.
The whole integral estimate and its variance are therefore given by:
$$
I = V \cdot \langle f \rangle \et \sigma_I^2 = V^2 \cdot \sigma^2
$$
When implemented, MISER is in fact a recursive method. With a given step, all
the possible bisections are tested and the one which minimizes the combined
variance of the two sub-regions is selected. The variance in the sub-regions is
estimated with a fraction of the total number of available points. The remaining
sample points are allocated to the sub-regions using the formula for $n_a$ and
$n_b$, once the variances are computed.
When implemented, MISER is in fact a recursive method. First, all the possible
bisections of $\Omega$ are tested and the one which minimizes the combined
variance of the two sub-regions is selected. In order to speed up the
algorithm, the variance in the sub-regions is estimated with a fraction of the
total number of available points (function calls) which is default set to 0.1.
The remaining points are allocated to the sub-regions using the formula for
$n_a$ and $n_b$, once the variances are computed.
The same procedure is then repeated recursively for each of the two half-spaces
from the best bisection. At each recursion step, the integral and the error are
estimated using a plain Monte Carlo algorithm. After a given number of calls,
the final individual values and their error estimates are then combined upwards
to give an overall result and an estimate of its error.
from the best bisection. When less than 32*16 (default option) function calls
are available in a single section, the integral and the error in this section
are estimated using the plain Monte Carlo algorithm.
The final individual values and their error estimates are then combined upwards
to give an overall result and an estimate of its error [@sayah19].
Results for this particular sample are shown in @tbl:MISER.
![Estimations $I^{\text{oss}}$ of the integral $I$ obtained for the three
implemented method for different values of function calls. Errorbars
showing their estimated uncertainties.](images/MC_MC_MI.pdf){#fig:MI}
-------------------------------------------------------------------------
500'000 calls 5'000'000 calls 50'000'000 calls
----------------- ----------------- ------------------ ------------------
$I^{\text{oss}}$ 1.7182850738 1.7182819143 1.7182818221
Results for this particular sample are shown in black in @fig:MI and some of
them are listed in @tbl:MI. Except for the first very little number of calls,
the improvement with respect to the Plain MC technique (in red) is appreciable.
$\sigma$ 0.0000021829 0.0000001024 0.0000000049
---------------------------------------------------------------------------
calls $I^{\text{oss}}$ $\sigma$ diff
------------------ ------------------ ------------------ ------------------
\SI{5e5}{} 1.7182850738 0.0000021829 0.0000032453
diff 0.0000032453 0.0000000858 000000000064
-------------------------------------------------------------------------
\SI{5e6}{} 1.7182819143 0.0000001024 0.0000000858
Table: MISER results with different numbers of function calls. Be careful:
while in @tbl:MC the number of function calls stands for the number of
total sampled poins, in this case it stands for the times each section
is divided into subsections. {#tbl:MISER}
\SI{5e7}{} 1.7182818221 0.0000000049 0.0000000064
---------------------------------------------------------------------------
This time the error, altough it lies always in the same order of magnitude of
diff, seems to seesaw around the correct value, which is much more closer to
the expected one.
Table: MISER results with different numbers of function calls. Differences
between computed and exact values are given in diff. {#tbl:MI}
This time the convergence is much faster: already with 500'000 number of points,
the correct results differs from the computed one at the fifth decimal digit.
Once again, the error lies always in the same order of magnitude of diff.
## Importance sampling
@ -230,16 +251,14 @@ probability distribution $f$ itself, so that the points cluster in the regions
that make the largest contribution to the integral.
Remind that $I = V \cdot \langle f \rangle$ and therefore only $\langle f
\rangle$ must be estimated. Then, consider a sample of $n$ points {$x_i$}
generated according to a probability distribition function $P$ which gives
thereby the following expected value:
\rangle$ must be estimated. Consider a sample of $n$ points {$x_i$} generated
according to a probability distribution function $P$ which gives thereby the
following expected value:
$$
E [x, P] = \frac{1}{n} \sum_i x_i
$$
with variance:
$$
\sigma^2 [E, P] = \frac{\sigma^2 [x, P]}{n}
\with \sigma^2 [x, P] = \frac{1}{n -1} \sum_i \left( x_i - E [x, P] \right)^2
@ -252,13 +271,11 @@ are evenly distributed in $\Omega$. The idea is to sample points from a
different distribution to lower the variance of $E[x, P]$, which results in
lowering $\sigma^2 [x, P]$. This is accomplished by choosing a random variable
$y$ and defining a new probability $P^{(y)}$ in order to satisfy:
$$
E [x, P] = E \left[ \frac{x}{y}, P^{(y)} \right]
$$
which is to say:
$$
I = \int \limits_{\Omega} dx f(x) =
\int \limits_{\Omega} dx \, \frac{f(x)}{g(x)} \, g(x)=
@ -266,7 +283,6 @@ $$
$$
where $E \, \longleftrightarrow \, I$ and:
$$
\begin{cases}
f(x) \, \longleftrightarrow \, x \\
@ -274,42 +290,32 @@ $$
\end{cases}
\et
\begin{cases}
w(x) \, \longleftrightarrow \, \frac{x}{y} \\
g(x) \, \longleftrightarrow \, y = P^{(y)}
g(x) \, \longleftrightarrow \, y = P^{(y)} \\
w(x) = \frac{f(x)}{g(x)} \, \longleftrightarrow \, \frac{x}{y}
\end{cases}
$$
Where the symbol $\longleftrightarrow$ points out the connection between the
variables. This new estimate is better than the former if:
variables. The best variable $y$ would be:
$$
\sigma^2 \left[ \frac{x}{y}, P^{(y)} \right] < \sigma^2 [x, P]
y = \frac{x}{E [x, P]} \, \longleftrightarrow \, \frac{f(x)}{I}
\thus \frac{x}{y} = E [x, P]
$$
The best variable $y$ would be:
$$
y^{\star} = \frac{x}{E [x, P]} \, \longleftrightarrow \, \frac{f(x)}{I}
\thus \frac{x}{y^{\star}} = E [x, P]
$$
and even a single sample under $P^{(y^{\star})}$ would be sufficient to give its
value. Obviously, it is not possible to take exactly this choice, since $E [x,
P]$ is not given a priori.
However, this gives an insight into what importance sampling does. In fact,
given that:
and even a single sample under $P^{(y)}$ would be sufficient to give its value:
it can be shown that under this probability distribution the variance vanishes.
Obviously, it is not possible to take exactly this choice, since $E [x, P]$ is
not given a priori. However, this gives an insight into what importance sampling
does. In fact, given that:
$$
E [x, P] = \int \limits_{a = - \infty}^{a = + \infty}
a P(x \in [a, a + da])
da \, a P(x \in [a, a + da])
$$
the best probability change $P^{(y^{\star})}$ redistributes the law of $x$ so
that its samples frequencies are sorted directly according to their weights in
$E[x, P]$, namely:
the best probability $P^{(y)}$ redistributes the law of $x$ so that its samples
frequencies are sorted directly according to their weights in $E[x, P]$, namely:
$$
P^{(y^{\star})}(x \in [a, a + da]) = \frac{1}{E [x, P]} a P (x \in [a, a + da])
P^{(y}(x \in [a, a + da]) = \frac{1}{E [x, P]} a P (x \in [a, a + da])
$$
In conclusion, since certain values of $x$ have more impact on $E [x, P]$ than
@ -320,37 +326,82 @@ frequently. As a consequence, the estimator variance will be reduced.
### VEGAS
The VEGAS algorithm is based on importance sampling. It aims to reduce the
integration error by concentrating points in the regions that make the largest
contribution to the integral.
The VEGAS algorithm of Lepage is based on importance sampling. It aims to
reduce the integration error by concentrating points in the regions that make
the largest contribution to the integral.
As stated before, in practice it is impossible to sample points from the best
distribution $P^{(y^{\star})}$: only a good approximation can be achieved. In
GSL, the VEGAS algorithm approximates the distribution by histogramming the
function $f$ in different subregions. Each histogram is used to define a
sampling distribution for the next pass, which consists in doing the same thing
recorsively: this procedure converges asymptotically to the desired
distribution. It follows that a better estimation is achieved with a greater
number of function calls.
The integration uses a fixed number of function calls. The result and its
error estimate are based on a weighted average of independent samples, as for
MISER.
For this particular sample, results are shown in @tbl:VEGAS.
distribution $P^{(y)}$: only a good approximation can be achieved. In GSL, the
VEGAS algorithm approximates the distribution by histogramming the function $f$
in different subregions with a reiterative method [@lepage78], namely:
-------------------------------------------------------------------------
500'000 calls 5'000'000 calls 50'000'000 calls
----------------- ----------------- ------------------ ------------------
$I^{\text{oss}}$ 1.7182818354 1.7182818289 1.7182818285
- a fixed number of points (function calls) is evenly generated in the whole
region;
- the volume $V$ is divided into $N$ intervals with width $\Delta x_i =
\Delta x \, \forall \, i$, where $N$ is limited by the computer storage
space available and must be held constant from iteration to iteration.
Default $N = 50$;
- each interval is then divided into $m_i + 1$ subintervals, where:
$$
m_i = K \frac{\bar{f}_i \Delta x_i}{\sum_j \bar{f}_j \Delta x_j}
$$
$\sigma$ 0.0000000137 0.0000000004 0.0000000000
where $j$ runs over all the intervals and $\bar{f}_i$ is the average value
of $f$ in the interval. Hence, $m_i$ is therefore a measure of the
"importance" of the interval with respect to the others: the higher
$\bar{f}_i$, the higher $m_i$. The stiffness $K$ is default set to 1.5;
- as it is desirable to restore the number of intervals to its original value
$N$, groups of the new intervals must be amalgamated into larger intervals,
the number of subintervals in each group being constant. The net effect is
to alter the intervals sizes, while keeping the total number constant, so
that the smallest intervals occur where $f$ is largest;
- the new grid is used and further refined in subsequent iterations until the
optimal grid has been obtained. By default, the number of iterations is set
to 5.
diff 0.0000000069 0.0000000004 0.0000000000
-------------------------------------------------------------------------
At the end, a cumulative estimate of the integral $I$ and its error $\sigma_I$
are made based on weighted average:
$$
I = \sigma_I^2 \sum_i \frac{I_i}{\sigma_i^2}
\with
\sigma_I^2 = \left( \sum_i \frac{1}{\sigma_i^2} \right)^{-1}
$$
Table: VEGAS results with different numbers of
where $I_i$ and $\sigma_i$ are are the integral and standard deviation
estimated in each interval $\Delta x_i$ of the last iteration using the plain
MC technique.
For the results to be reliable, the chi-squared per degree of freedom
$\chi_r^2$ must be consistent with 1. While performing the reiterations, if
the $\chi_r^2$ value exceed 1.5, the cycle breaks.
Clearly, once again a better estimation is achieved with a greater number of
function calls. For this particular sample, the most accurate results are shown
in @fig:MI_VE and some of them are listed in @tbl:VEGAS.
----------------------------------------------------------------------------------------------
calls $I^{\text{oss}}$ $\sigma$ diff $\chi_r^2$
------------------ ------------------ ------------------ ------------------ ------------------
\SI{5e5}{} 1.7182818281 0.0000000012 0.0000000004 1.457
\SI{5e6}{} 1.7182818284 0.0000000000 0.0000000001 0.632
\SI{5e7}{} 1.7182818285 0.0000000000 0.0000000000 0.884
----------------------------------------------------------------------------------------------
Table: Some VEGAS results with different numbers of
function calls. {#tbl:VEGAS}
This time, the error estimation is notably close to diff for each number of
function calls, meaning that the estimation of both the integral and its
error turn out to be very accurate, much more than the ones obtained with
both plain Monte Carlo method and stratified sampling.
As can be appreciated in @fig:MI_VE, the VEGAS algorithm manages to compute
the integral value in a most accurate way with respect to MISER. The $\chi_r^2$
turns out to be close enough to 1 to guarantee a good estimation of $I$,
goodness which is also confirmed by the very small difference between estimation
and exact value, as shown in @tbl:VEGAS: with a number of \SI{5e7}{} of function
calls, the difference is smaller than \SI{1e-10}{}.
![Only the most accurate results are shown in order to stress the
differences between VEGAS (in gray) and MISER (in black) methods
results.](images/MC_MI_VE.pdf){#fig:MI_VE}
In conclusion, between a plain Monte Carlo technique, stratified sampling and
importance sampling, the last one turned out to be the most powerful mean to
obtain a good estimation of the integrand.

View File

@ -1,5 +1,3 @@
- cambiare simbolo convoluzione
- aggiungere citazioni e referenze
- rifare grafici senza bordino
- leggere l'articolo di Lucy
- riscrivere il readme del 5