ex-5: revised and typo-fixed

Some images and an article have also been added to the folder.
The todo has been updated too as well as the readme.
This commit is contained in:
Giù Marcer 2020-05-19 16:01:52 +02:00 committed by rnhmjoj
parent 9303f4f379
commit bdc12117a7
10 changed files with 276 additions and 158 deletions

View File

@ -42,15 +42,15 @@ See this [guide](https://nixos.org/nix/manual/#chap-quick-start) if you have
never used Nix before. Running `nix-shell` in the top-level will drop you into never used Nix before. Running `nix-shell` in the top-level will drop you into
the development shell. the development shell.
Once ready, invoke `make` with the program you wish to build. For example Once ready, invoke `make` with the program you wish to build. For example:
$ make ex-1/bin/main $ make ex-1/bin/main
or, to build every program of an exercise or, to build every program of an exercise:
$ make ex-1 $ make ex-1
To clean up the build results run To clean up the build results run:
$ make clean $ make clean
@ -82,9 +82,9 @@ The output can be redirected to `ex-1/pdf-plot.py` to generate a plot.
Every program in `ex-2` computes the best available approximation (with a given Every program in `ex-2` computes the best available approximation (with a given
method) to the Euler-Mascheroni γ constant and prints[1]: method) to the Euler-Mascheroni γ constant and prints[1]:
1. the leading decimal digits of the approximate value found 1. the leading decimal digits of the approximate value found;
2. the exact decimal digits of γ 2. the exact decimal digits of γ;
3. the absolute difference between the 1. and 2. 3. the absolute difference between the 1. and 2.
@ -104,7 +104,7 @@ cases the best fit and the parameter covariance matrix are printed.
The program then performs a t-test to assert the compatibility of the data with The program then performs a t-test to assert the compatibility of the data with
two hypothesis and print the results in a table. two hypothesis and print the results in a table.
To plot a 2D histogram of the generated sample do To plot a 2D histogram of the generated sample do:
$ ex-3/bin/main -i | ex-3/plot.py $ ex-3/bin/main -i | ex-3/plot.py
@ -123,7 +123,7 @@ horizontal component. It is possible to set the maximum momentum with the
option `-p`. A χ² fit and a t-test compatibility are performed with respect option `-p`. A χ² fit and a t-test compatibility are performed with respect
to the expected distribution and results are printed. to the expected distribution and results are printed.
To plot a histogram of the generated sample do To plot a histogram of the generated sample do:
$ ex-4/bin/main -o | ex-4/plot.py $ ex-4/bin/main -o | ex-4/plot.py
@ -133,11 +133,30 @@ and `-b`.
### Exercise 5 ### Exercise 5
`ex-5/main` compute the integral of exp(1) between 0 and 1 with the methods `ex-5/main` compute estimations of the integral of exp(1) between 0 and 1
plain MC, MISER and VEGAS. Being reiterative routines, it takes the number of with the methods plain MC, MISER and VEGAS with different number of points.
iterations as its only argument. It takes no arguments in input.
It prints out the obatined value and its estimated error for each method. It prints out eight columns:
1. the number of sorted points;
2. the integral estimation with plain MC;
3. its uncertainty;
4. the integral estimation with MISER;
5. its uncertainty;
6. the integral estimation with VEGAS;
7. its uncertainty;
8. its χ².
To plot the results do:
$ ex-5/main | ex-5/plot.py
### Exercise 6 ### Exercise 6
@ -167,8 +186,8 @@ classified data in this order: signal then noise.
To plot the result of the linear classification pipe the output to To plot the result of the linear classification pipe the output to
`ex-7/plot.py`. The program generates two figures: `ex-7/plot.py`. The program generates two figures:
- a scatter plot showing the Fisher projection line and the cut line - a scatter plot showing the Fisher projection line and the cut line;
- two histograms of the projected data and the cut line - two histograms of the projected data and the cut line.
`ex-7/bin/test` takes a model trained in `ex-7/bin/main` and test it against `ex-7/bin/test` takes a model trained in `ex-7/bin/main` and test it against
newly generated datasets (`-i` to set the number of test iterations). The newly generated datasets (`-i` to set the number of test iterations). The

View File

@ -15,14 +15,18 @@ double function (double * x, size_t dim, void * params)
// Function which prints out the results when called. // Function which prints out the results when called.
// //
void results (size_t calls, double result, double error) void results (size_t calls, double result, double error, double chi)
{ {
if (calls != 0) if (calls != 0)
{ {
printf ("%ld\t", calls); printf ("%.0e\t", (double)calls);
} }
printf ("%.10f\t", result); printf ("%.10f\t", result);
printf ("%.10f\t", error); printf ("%.10f\t", error);
if (chi != 0)
{
printf ("%.3f\t", chi);
}
} }
@ -33,7 +37,7 @@ int main(int argc, char** argv)
size_t dims = 1; // Integral dimension size_t dims = 1; // Integral dimension
double lower[1] = {0}; // Integration lower limit double lower[1] = {0}; // Integration lower limit
double upper[1] = {1}; // Integration upper limit double upper[1] = {1}; // Integration upper limit
size_t calls = 5000; // Initial number of function calls size_t calls = 50; // Initial number of function calls
double integral, error; double integral, error;
// Initialize an RNG. // Initialize an RNG.
@ -65,26 +69,37 @@ int main(int argc, char** argv)
gsl_monte_plain_integrate (&expo, lower, upper, dims, calls, gsl_monte_plain_integrate (&expo, lower, upper, dims, calls,
r, sMC, &integral, &error); r, sMC, &integral, &error);
gsl_monte_plain_free(sMC); gsl_monte_plain_free(sMC);
results(calls, integral, error); results(calls, integral, error, 0);
// MISER // MISER
gsl_monte_miser_state *sMI = gsl_monte_miser_alloc (dims); gsl_monte_miser_state *sMI = gsl_monte_miser_alloc (dims);
gsl_monte_miser_integrate (&expo, lower, upper, dims, calls, gsl_monte_miser_integrate (&expo, lower, upper, dims, calls,
r, sMI, &integral, &error); r, sMI, &integral, &error);
gsl_monte_miser_free(sMI); gsl_monte_miser_free(sMI);
results(0, integral, error); results(0, integral, error, 0);
// VEGAS // VEGAS
gsl_monte_vegas_state *sVE = gsl_monte_vegas_alloc (dims); gsl_monte_vegas_state *sVE = gsl_monte_vegas_alloc (dims);
gsl_monte_vegas_integrate (&expo, lower, upper, dims, calls, gsl_monte_vegas_integrate (&expo, lower, upper, dims, calls,
r, sVE, &integral, &error); r, sVE, &integral, &error);
// In order to see the parameters of the VEGAS integration, decomment this
// paragraph:
//
// gsl_monte_vegas_params params;
// gsl_monte_vegas_params_get(sVE, &params);
// params.verbose = 1;
// params.ostream = stderr;
// gsl_monte_vegas_params_set(sVE, &params);
do do
{ {
gsl_monte_vegas_integrate (&expo, lower, upper, dims, gsl_monte_vegas_integrate (&expo, lower, upper, dims,
calls/5, r, sVE, &integral, &error); calls, r, sVE, &integral, &error);
} while (fabs (gsl_monte_vegas_chisq (sVE) - 1.0) > 0.5); } while (fabs (gsl_monte_vegas_chisq (sVE) - 1.0) > 0.5);
gsl_monte_vegas_free(sVE); gsl_monte_vegas_free(sVE);
results(0, integral, error); double chi = sVE->chisq;
results(0, integral, error, chi);
// Update function calls // Update function calls
printf ("\n"); printf ("\n");

4
ex-5/plot.py Normal file → Executable file
View File

@ -1,8 +1,10 @@
#!/usr/bin/env python
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import numpy as np import numpy as np
import sys import sys
calls, I_MC, σ_MC, I_MI, σ_MI, I_VE, σ_VE = np.loadtxt(sys.stdin, unpack=True) calls, I_MC, σ_MC, I_MI, σ_MI, I_VE, σ_VE, chi = np.loadtxt(sys.stdin, unpack=True)
exact = 1.7182818285 exact = 1.7182818285
plt.rcParams['font.size'] = 17 plt.rcParams['font.size'] = 17

View File

@ -106,3 +106,35 @@
year={2005}, year={2005},
journal={Matrix} journal={Matrix}
} }
@article{sayah19,
title={Adaptive stratified Monte Carlo algorithm for numerical computation
of integrals},
author={Sayah, Toni},
journal={Mathematics and Computers in Simulation},
volume={157},
pages={143--158},
year={2019},
publisher={Elsevier}
}
@article{ridder17,
title={Variance reduction},
author={Ridder, AAN and Botev, ZI},
journal={Wiley StatsRef: Statistics Reference Online},
pages={1--6},
year={2017},
publisher={Wiley}
}
@article{lepage78,
title={A new algorithm for adaptive multidimensional integration},
author={Lepage, G Peter},
journal={Journal of Computational Physics},
volume={27},
number={2},
pages={192--203},
year={1978},
publisher={Elsevier}
}

View File

@ -209,7 +209,8 @@ $$
P^{\text{oss}}_{\text{max}} = 10.005 \pm 0.018 \with \chi_r^2 = 0.071 P^{\text{oss}}_{\text{max}} = 10.005 \pm 0.018 \with \chi_r^2 = 0.071
$$ $$
where $\chi_r^2$ is the reduced $\chi^2$, proving a good convergence. where $\chi_r^2$ is the $\chi^2$ per degree of freedom, proving a good
convergence.
In order to compare $P^{\text{oss}}_{\text{max}}$ with the expected value In order to compare $P^{\text{oss}}_{\text{max}}$ with the expected value
$P_{\text{max}} = 10$, the following compatibility t-test was applied: $P_{\text{max}} = 10$, the following compatibility t-test was applied:

View File

@ -1,10 +1,7 @@
# Exercize 5 # Exercise 5
The following integral must be evaluated: The following integral must be evaluated comparing different Monte Carlo
techniques.
$$
I = \int\limits_0^1 dx \, e^x
$$
\begin{figure} \begin{figure}
\hypertarget{fig:exp}{% \hypertarget{fig:exp}{%
@ -24,6 +21,8 @@ $$
% Plot % Plot
\draw [domain=-2:7, smooth, variable=\x, \draw [domain=-2:7, smooth, variable=\x,
cyclamen, ultra thick] plot ({\x},{exp(\x/5)}); cyclamen, ultra thick] plot ({\x},{exp(\x/5)});
% Equation
\node [above] at (2.5, 2.5) {$I = \int\limits_0^1 dx \, e^x$};
\end{tikzpicture} \end{tikzpicture}
\caption{Plot of the integral to be evaluated.} \caption{Plot of the integral to be evaluated.}
} }
@ -33,14 +32,14 @@ whose exact value is 1.7182818285...
The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser
and Vegas. Besides this popularity fact, these three method were chosen for and Vegas. Besides this popularity fact, these three method were chosen for
being the only ones implemented in the GSL library. being implemented in the GSL libraries `gsl_monte_plain`, `gsl_monte_miser` and
`gsl_monte_vegas` respectively.
## Plain Monte Carlo ## Plain Monte Carlo
When the integral $I$ over a $n-$dimensional space $\Omega$ of volume $V$ of a When an integral $I$ over a $n-$dimensional space $\Omega$ of volume $V$ of a
function $f$ must be evaluated, that is: function $f$ must be evaluated, that is:
$$ $$
I = \int\limits_{\Omega} dx \, f(x) I = \int\limits_{\Omega} dx \, f(x)
\with V = \int\limits_{\Omega} dx \with V = \int\limits_{\Omega} dx
@ -48,14 +47,12 @@ $$
the simplest MC method approach is to sample $N$ points $x_i$ evenly distributed the simplest MC method approach is to sample $N$ points $x_i$ evenly distributed
in $V$ and approx $I$ as: in $V$ and approx $I$ as:
$$ $$
I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle
$$ $$
with $I_N \rightarrow I$ for $N \rightarrow + \infty$ for the law of large with $I_N \rightarrow I$ for $N \rightarrow + \infty$ because of the law of
numbers. Hence, the sample variance can be extimated by the sample variance: large numbers, whereas the sample variance can be estimated as:
$$ $$
\sigma^2_f = \frac{1}{N - 1} \sum_{i = 1}^N \left( f(x_i) - \langle f \sigma^2_f = \frac{1}{N - 1} \sum_{i = 1}^N \left( f(x_i) - \langle f
\rangle \right)^2 \et \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N \rangle \right)^2 \et \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N
@ -67,31 +64,55 @@ Unlike in deterministic methods, the estimate of the error is not a strict error
bound: random sampling may not uncover all the important features of the bound: random sampling may not uncover all the important features of the
integrand and this can result in an underestimate of the error. integrand and this can result in an underestimate of the error.
In this case, $f(x) = e^{x}$ and $\Omega = [0,1]$. In this case $f(x) = e^{x}$ and $\Omega = [0,1]$, hence $V = 1$.
Since the proximity of $I_N$ to $I$ is related to $N$, the accuracy of the Since the proximity of $I_N$ to $I$ is related to $N$, the accuracy of the
method is determined by how many points are generated, namely how many function method lies in how many points are generated, namely how many function calls
calls are exectuted when the method is implemented. In @tbl:MC, the obtained are executed when the reiterative method is implemented. In @fig:MC and
results and errors $\sigma$ are shown. The estimated integrals for different @fig:MI, results obtained with the plain MC method are shown in red. In
numbers of calls are compared to the expected value $I$ and the difference @tbl:MC, some of them are listed: the estimated integrals $I^{\text{oss}}$ are
'diff' between them is given. compared to the expected value $I$ and the differences diff between them are
As can be seen, the MC method tends to underestimate the error for scarse given.
function calls. As previously stated, the higher the number of function calls,
the better the estimation of $I$. A further observation regards the fact that,
even with $50'000'000$ calls, the $I^{\text{oss}}$ still differs from $I$ at
the fifth decimal digit.
------------------------------------------------------------------------- ![Estimated values of $I$ obatined by Plain MC technique with different
500'000 calls 5'000'000 calls 50'000'000 calls number of function calls; logarithmic scale; errorbars showing their
----------------- ----------------- ------------------ ------------------ estimated uncertainties. As can be seen, the process does a sort o seesaw
$I^{\text{oss}}$ 1.7166435813 1.7181231109 1.7183387184 around the correct value.](images/MC_MC.pdf){#fig:MC}
$\sigma$ 0.0006955691 0.0002200309 0.0000695809 ---------------------------------------------------------------------------
calls $I^{\text{oss}}$ $\sigma$ diff
------------------ ------------------ ------------------ ------------------
\SI{5e5}{} 1.7166435813 0.0006955691 0.0016382472
diff 0.0016382472 0.0001587176 0.0000568899 \SI{5e6}{} 1.7181231109 0.0002200309 0.0001587176
-------------------------------------------------------------------------
Table: MC results with different numbers of function calls. {#tbl:MC} \SI{5e7}{} 1.7183387184 0.0000695809 0.0000568899
---------------------------------------------------------------------------
Table: Some MC results with three different numbers of function calls.
Differences between computed and exact values are given in
diff. {#tbl:MC}
As can be seen, $\sigma$ is always of the same order of magnitude of diff,
except for very low numbers of function calls. Even with \SI{5e7}{} calls,
$I^{\text{oss}}$ still differs from $I$ at the fifth decimal digit, meaning that
this method shows a really slow convergence. In fact, since the $\sigma$s
dependence on the number $C$ of function calls is confirmed:
$$
\begin{cases}
\sigma_1 = \SI{6.95569e-4}{} \longleftrightarrow C_1 = \SI{5e6}{} \\
\sigma_2 = \SI{6.95809e-5}{} \longleftrightarrow C_1 = \SI{5e8}{}
\end{cases}
$$
$$
\thus
\frac{\sigma_1}{\sigma_2} = 9.9965508 \sim 10 = \sqrt{\frac{C_2}{C_1}}
$$
if an error of $\sim 1^{-n}$ is required, a number $\propto 10^{2n}$ of
function calls must be executed, meaning that for $\sigma \sim 1^{-10}
\rightarrow C = \SI{1e20}{}$, which would be impractical.
## Stratified sampling ## Stratified sampling
@ -101,12 +122,12 @@ partitioned into subpopulations. Stratification, indeed, is the process of
dividing the primary sample into subgroups (strata) before sampling random dividing the primary sample into subgroups (strata) before sampling random
within each stratum. within each stratum.
Given the mean $\bar{x}_i$ and variance ${\sigma^2_x}_i$ of an entity $x$ Given the mean $\bar{x}_i$ and variance ${\sigma^2_x}_i$ of an entity $x$
sorted with simple random sampling in each strata, such as: sorted with simple random sampling in the $i^{\text{th}}$ strata, such as:
$$ $$
\bar{x}_i = \frac{1}{n_i} \sum_j x_j \bar{x}_i = \frac{1}{n_i} \sum_j x_j
$$ $$
and:
$$ $$
\sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( x_j - \bar{x}_i \right)^2 \sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( x_j - \bar{x}_i \right)^2
\thus \thus
@ -115,13 +136,12 @@ $$
where: where:
- $j$ runs over the points $x_j$ sampled in the $i^{\text{th}}$ stratum - $j$ runs over the points $x_j$ sampled in the $i^{\text{th}}$ stratum,
- $n_i$ is the number of points sorted in it - $n_i$ is the number of points sorted in it,
- $\sigma_i^2$ is the variance associated with the $j^{\text{th}}$ point - $\sigma_i^2$ is the variance associated with the $j^{\text{th}}$ point.
then the mean $\bar{x}$ and variance $\sigma_x^2$ estimated with stratified then, the mean $\bar{x}$ and variance $\sigma_x^2$ estimated with stratified
sampling for the whole population are: sampling for the whole population are:
$$ $$
\bar{x} = \frac{1}{N} \sum_i N_i \bar{x}_i \et \bar{x} = \frac{1}{N} \sum_i N_i \bar{x}_i \et
\sigma_x^2 = \sum_i \left( \frac{N_i}{N} \right)^2 {\sigma_x}^2_i \sigma_x^2 = \sum_i \left( \frac{N_i}{N} \right)^2 {\sigma_x}^2_i
@ -138,7 +158,7 @@ result will have a smaller error in estimation with respect to the one otherwise
obtained with simple sampling. obtained with simple sampling.
For this reason, stratified sampling is used as a method of variance reduction For this reason, stratified sampling is used as a method of variance reduction
when MC methods are used to estimate population statistics from a known when MC methods are used to estimate population statistics from a known
population. population [@ridder17].
### MISER ### MISER
@ -147,36 +167,32 @@ The MISER technique aims to reduce the integration error through the use of
recursive stratified sampling. recursive stratified sampling.
As stated before, according to the law of large numbers, for a large number of As stated before, according to the law of large numbers, for a large number of
extracted points, the estimation of the integral $I$ can be computed as: extracted points, the estimation of the integral $I$ can be computed as:
$$ $$
I= V \cdot \langle f \rangle I= V \cdot \langle f \rangle
$$ $$
Since $V$ is known (in this case, $V = 1$), it is sufficient to estimate Since $V$ is known (in this case, $V = 1$), it is sufficient to estimate
$\langle f \rangle$. $\langle f \rangle$.
Consider two disjoint regions $a$ and $b$, such that $a \cup b = \Omega$, in Consider two disjoint regions $a$ and $b$, such that $a \cup b = \Omega$, in
which $n_a$ and $n_b$ points were uniformely sampled. Given the Monte Carlo which $n_a$ and $n_b$ points are respectively uniformly sampled. Given the
estimates of the means $\langle f \rangle_a$ and $\langle f \rangle_b$ of those Monte Carlo estimates of the means $\langle f \rangle_a$ and $\langle f
points and their variances $\sigma_a^2$ and $\sigma_b^2$, if the weights $N_a$ \rangle_b$ of those points and their variances $\sigma_a^2$ and $\sigma_b^2$, if
and $N_b$ of $\langle f \rangle_a$ and $\langle f \rangle_b$ are chosen unitary, the weights $N_a$ and $N_b$ of $\langle f \rangle_a$ and $\langle f \rangle_b$
then the variance $\sigma^2$ of the combined estimate $\langle f \rangle$: are chosen unitary, then the variance $\sigma^2$ of the combined estimate
$\langle f \rangle$:
$$ $$
\langle f \rangle = \frac{1}{2} \left( \langle f \rangle_a \langle f \rangle = \frac{1}{2} \left( \langle f \rangle_a
+ \langle f \rangle_b \right) + \langle f \rangle_b \right)
$$ $$
is given by: is given by:
$$ $$
\sigma^2 = \frac{\sigma_a^2}{4n_a} + \frac{\sigma_b^2}{4n_b} \sigma^2 = \frac{\sigma_a^2}{4n_a} + \frac{\sigma_b^2}{4n_b}
$$ $$
It can be shown that this variance is minimized by distributing the points such It can be shown that this variance is minimized by distributing the points such
that: that:
$$ $$
\frac{n_a}{n_a + n_b} = \frac{\sigma_a}{\sigma_a + \sigma_b} \frac{n_a}{n_a + n_b} = \frac{\sigma_a}{\sigma_a + \sigma_b}
$$ $$
@ -184,43 +200,48 @@ $$
Hence, the smallest error estimate is obtained by allocating sample points in Hence, the smallest error estimate is obtained by allocating sample points in
proportion to the standard deviation of the function in each sub-region. proportion to the standard deviation of the function in each sub-region.
The whole integral estimate and its variance are therefore given by: The whole integral estimate and its variance are therefore given by:
$$ $$
I = V \cdot \langle f \rangle \et \sigma_I^2 = V^2 \cdot \sigma^2 I = V \cdot \langle f \rangle \et \sigma_I^2 = V^2 \cdot \sigma^2
$$ $$
When implemented, MISER is in fact a recursive method. With a given step, all When implemented, MISER is in fact a recursive method. First, all the possible
the possible bisections are tested and the one which minimizes the combined bisections of $\Omega$ are tested and the one which minimizes the combined
variance of the two sub-regions is selected. The variance in the sub-regions is variance of the two sub-regions is selected. In order to speed up the
estimated with a fraction of the total number of available points. The remaining algorithm, the variance in the sub-regions is estimated with a fraction of the
sample points are allocated to the sub-regions using the formula for $n_a$ and total number of available points (function calls) which is default set to 0.1.
$n_b$, once the variances are computed. The remaining points are allocated to the sub-regions using the formula for
$n_a$ and $n_b$, once the variances are computed.
The same procedure is then repeated recursively for each of the two half-spaces The same procedure is then repeated recursively for each of the two half-spaces
from the best bisection. At each recursion step, the integral and the error are from the best bisection. When less than 32*16 (default option) function calls
estimated using a plain Monte Carlo algorithm. After a given number of calls, are available in a single section, the integral and the error in this section
the final individual values and their error estimates are then combined upwards are estimated using the plain Monte Carlo algorithm.
to give an overall result and an estimate of its error. The final individual values and their error estimates are then combined upwards
to give an overall result and an estimate of its error [@sayah19].
Results for this particular sample are shown in @tbl:MISER. ![Estimations $I^{\text{oss}}$ of the integral $I$ obtained for the three
implemented method for different values of function calls. Errorbars
showing their estimated uncertainties.](images/MC_MC_MI.pdf){#fig:MI}
------------------------------------------------------------------------- Results for this particular sample are shown in black in @fig:MI and some of
500'000 calls 5'000'000 calls 50'000'000 calls them are listed in @tbl:MI. Except for the first very little number of calls,
----------------- ----------------- ------------------ ------------------ the improvement with respect to the Plain MC technique (in red) is appreciable.
$I^{\text{oss}}$ 1.7182850738 1.7182819143 1.7182818221
$\sigma$ 0.0000021829 0.0000001024 0.0000000049 ---------------------------------------------------------------------------
calls $I^{\text{oss}}$ $\sigma$ diff
------------------ ------------------ ------------------ ------------------
\SI{5e5}{} 1.7182850738 0.0000021829 0.0000032453
diff 0.0000032453 0.0000000858 000000000064 \SI{5e6}{} 1.7182819143 0.0000001024 0.0000000858
-------------------------------------------------------------------------
Table: MISER results with different numbers of function calls. Be careful: \SI{5e7}{} 1.7182818221 0.0000000049 0.0000000064
while in @tbl:MC the number of function calls stands for the number of ---------------------------------------------------------------------------
total sampled poins, in this case it stands for the times each section
is divided into subsections. {#tbl:MISER}
This time the error, altough it lies always in the same order of magnitude of Table: MISER results with different numbers of function calls. Differences
diff, seems to seesaw around the correct value, which is much more closer to between computed and exact values are given in diff. {#tbl:MI}
the expected one.
This time the convergence is much faster: already with 500'000 number of points,
the correct results differs from the computed one at the fifth decimal digit.
Once again, the error lies always in the same order of magnitude of diff.
## Importance sampling ## Importance sampling
@ -230,16 +251,14 @@ probability distribution $f$ itself, so that the points cluster in the regions
that make the largest contribution to the integral. that make the largest contribution to the integral.
Remind that $I = V \cdot \langle f \rangle$ and therefore only $\langle f Remind that $I = V \cdot \langle f \rangle$ and therefore only $\langle f
\rangle$ must be estimated. Then, consider a sample of $n$ points {$x_i$} \rangle$ must be estimated. Consider a sample of $n$ points {$x_i$} generated
generated according to a probability distribition function $P$ which gives according to a probability distribution function $P$ which gives thereby the
thereby the following expected value: following expected value:
$$ $$
E [x, P] = \frac{1}{n} \sum_i x_i E [x, P] = \frac{1}{n} \sum_i x_i
$$ $$
with variance: with variance:
$$ $$
\sigma^2 [E, P] = \frac{\sigma^2 [x, P]}{n} \sigma^2 [E, P] = \frac{\sigma^2 [x, P]}{n}
\with \sigma^2 [x, P] = \frac{1}{n -1} \sum_i \left( x_i - E [x, P] \right)^2 \with \sigma^2 [x, P] = \frac{1}{n -1} \sum_i \left( x_i - E [x, P] \right)^2
@ -252,13 +271,11 @@ are evenly distributed in $\Omega$. The idea is to sample points from a
different distribution to lower the variance of $E[x, P]$, which results in different distribution to lower the variance of $E[x, P]$, which results in
lowering $\sigma^2 [x, P]$. This is accomplished by choosing a random variable lowering $\sigma^2 [x, P]$. This is accomplished by choosing a random variable
$y$ and defining a new probability $P^{(y)}$ in order to satisfy: $y$ and defining a new probability $P^{(y)}$ in order to satisfy:
$$ $$
E [x, P] = E \left[ \frac{x}{y}, P^{(y)} \right] E [x, P] = E \left[ \frac{x}{y}, P^{(y)} \right]
$$ $$
which is to say: which is to say:
$$ $$
I = \int \limits_{\Omega} dx f(x) = I = \int \limits_{\Omega} dx f(x) =
\int \limits_{\Omega} dx \, \frac{f(x)}{g(x)} \, g(x)= \int \limits_{\Omega} dx \, \frac{f(x)}{g(x)} \, g(x)=
@ -266,7 +283,6 @@ $$
$$ $$
where $E \, \longleftrightarrow \, I$ and: where $E \, \longleftrightarrow \, I$ and:
$$ $$
\begin{cases} \begin{cases}
f(x) \, \longleftrightarrow \, x \\ f(x) \, \longleftrightarrow \, x \\
@ -274,42 +290,32 @@ $$
\end{cases} \end{cases}
\et \et
\begin{cases} \begin{cases}
w(x) \, \longleftrightarrow \, \frac{x}{y} \\ g(x) \, \longleftrightarrow \, y = P^{(y)} \\
g(x) \, \longleftrightarrow \, y = P^{(y)} w(x) = \frac{f(x)}{g(x)} \, \longleftrightarrow \, \frac{x}{y}
\end{cases} \end{cases}
$$ $$
Where the symbol $\longleftrightarrow$ points out the connection between the Where the symbol $\longleftrightarrow$ points out the connection between the
variables. This new estimate is better than the former if: variables. The best variable $y$ would be:
$$ $$
\sigma^2 \left[ \frac{x}{y}, P^{(y)} \right] < \sigma^2 [x, P] y = \frac{x}{E [x, P]} \, \longleftrightarrow \, \frac{f(x)}{I}
\thus \frac{x}{y} = E [x, P]
$$ $$
The best variable $y$ would be: and even a single sample under $P^{(y)}$ would be sufficient to give its value:
it can be shown that under this probability distribution the variance vanishes.
$$ Obviously, it is not possible to take exactly this choice, since $E [x, P]$ is
y^{\star} = \frac{x}{E [x, P]} \, \longleftrightarrow \, \frac{f(x)}{I} not given a priori. However, this gives an insight into what importance sampling
\thus \frac{x}{y^{\star}} = E [x, P] does. In fact, given that:
$$
and even a single sample under $P^{(y^{\star})}$ would be sufficient to give its
value. Obviously, it is not possible to take exactly this choice, since $E [x,
P]$ is not given a priori.
However, this gives an insight into what importance sampling does. In fact,
given that:
$$ $$
E [x, P] = \int \limits_{a = - \infty}^{a = + \infty} E [x, P] = \int \limits_{a = - \infty}^{a = + \infty}
a P(x \in [a, a + da]) da \, a P(x \in [a, a + da])
$$ $$
the best probability change $P^{(y^{\star})}$ redistributes the law of $x$ so the best probability $P^{(y)}$ redistributes the law of $x$ so that its samples
that its samples frequencies are sorted directly according to their weights in frequencies are sorted directly according to their weights in $E[x, P]$, namely:
$E[x, P]$, namely:
$$ $$
P^{(y^{\star})}(x \in [a, a + da]) = \frac{1}{E [x, P]} a P (x \in [a, a + da]) P^{(y}(x \in [a, a + da]) = \frac{1}{E [x, P]} a P (x \in [a, a + da])
$$ $$
In conclusion, since certain values of $x$ have more impact on $E [x, P]$ than In conclusion, since certain values of $x$ have more impact on $E [x, P]$ than
@ -320,37 +326,82 @@ frequently. As a consequence, the estimator variance will be reduced.
### VEGAS ### VEGAS
The VEGAS algorithm is based on importance sampling. It aims to reduce the The VEGAS algorithm of Lepage is based on importance sampling. It aims to
integration error by concentrating points in the regions that make the largest reduce the integration error by concentrating points in the regions that make
contribution to the integral. the largest contribution to the integral.
As stated before, in practice it is impossible to sample points from the best As stated before, in practice it is impossible to sample points from the best
distribution $P^{(y^{\star})}$: only a good approximation can be achieved. In distribution $P^{(y)}$: only a good approximation can be achieved. In GSL, the
GSL, the VEGAS algorithm approximates the distribution by histogramming the VEGAS algorithm approximates the distribution by histogramming the function $f$
function $f$ in different subregions. Each histogram is used to define a in different subregions with a reiterative method [@lepage78], namely:
sampling distribution for the next pass, which consists in doing the same thing
recorsively: this procedure converges asymptotically to the desired
distribution. It follows that a better estimation is achieved with a greater
number of function calls.
The integration uses a fixed number of function calls. The result and its
error estimate are based on a weighted average of independent samples, as for
MISER.
For this particular sample, results are shown in @tbl:VEGAS.
------------------------------------------------------------------------- - a fixed number of points (function calls) is evenly generated in the whole
500'000 calls 5'000'000 calls 50'000'000 calls region;
----------------- ----------------- ------------------ ------------------ - the volume $V$ is divided into $N$ intervals with width $\Delta x_i =
$I^{\text{oss}}$ 1.7182818354 1.7182818289 1.7182818285 \Delta x \, \forall \, i$, where $N$ is limited by the computer storage
space available and must be held constant from iteration to iteration.
Default $N = 50$;
- each interval is then divided into $m_i + 1$ subintervals, where:
$$
m_i = K \frac{\bar{f}_i \Delta x_i}{\sum_j \bar{f}_j \Delta x_j}
$$
$\sigma$ 0.0000000137 0.0000000004 0.0000000000 where $j$ runs over all the intervals and $\bar{f}_i$ is the average value
of $f$ in the interval. Hence, $m_i$ is therefore a measure of the
"importance" of the interval with respect to the others: the higher
$\bar{f}_i$, the higher $m_i$. The stiffness $K$ is default set to 1.5;
- as it is desirable to restore the number of intervals to its original value
$N$, groups of the new intervals must be amalgamated into larger intervals,
the number of subintervals in each group being constant. The net effect is
to alter the intervals sizes, while keeping the total number constant, so
that the smallest intervals occur where $f$ is largest;
- the new grid is used and further refined in subsequent iterations until the
optimal grid has been obtained. By default, the number of iterations is set
to 5.
diff 0.0000000069 0.0000000004 0.0000000000 At the end, a cumulative estimate of the integral $I$ and its error $\sigma_I$
------------------------------------------------------------------------- are made based on weighted average:
$$
I = \sigma_I^2 \sum_i \frac{I_i}{\sigma_i^2}
\with
\sigma_I^2 = \left( \sum_i \frac{1}{\sigma_i^2} \right)^{-1}
$$
Table: VEGAS results with different numbers of where $I_i$ and $\sigma_i$ are are the integral and standard deviation
estimated in each interval $\Delta x_i$ of the last iteration using the plain
MC technique.
For the results to be reliable, the chi-squared per degree of freedom
$\chi_r^2$ must be consistent with 1. While performing the reiterations, if
the $\chi_r^2$ value exceed 1.5, the cycle breaks.
Clearly, once again a better estimation is achieved with a greater number of
function calls. For this particular sample, the most accurate results are shown
in @fig:MI_VE and some of them are listed in @tbl:VEGAS.
----------------------------------------------------------------------------------------------
calls $I^{\text{oss}}$ $\sigma$ diff $\chi_r^2$
------------------ ------------------ ------------------ ------------------ ------------------
\SI{5e5}{} 1.7182818281 0.0000000012 0.0000000004 1.457
\SI{5e6}{} 1.7182818284 0.0000000000 0.0000000001 0.632
\SI{5e7}{} 1.7182818285 0.0000000000 0.0000000000 0.884
----------------------------------------------------------------------------------------------
Table: Some VEGAS results with different numbers of
function calls. {#tbl:VEGAS} function calls. {#tbl:VEGAS}
This time, the error estimation is notably close to diff for each number of As can be appreciated in @fig:MI_VE, the VEGAS algorithm manages to compute
function calls, meaning that the estimation of both the integral and its the integral value in a most accurate way with respect to MISER. The $\chi_r^2$
error turn out to be very accurate, much more than the ones obtained with turns out to be close enough to 1 to guarantee a good estimation of $I$,
both plain Monte Carlo method and stratified sampling. goodness which is also confirmed by the very small difference between estimation
and exact value, as shown in @tbl:VEGAS: with a number of \SI{5e7}{} of function
calls, the difference is smaller than \SI{1e-10}{}.
![Only the most accurate results are shown in order to stress the
differences between VEGAS (in gray) and MISER (in black) methods
results.](images/MC_MI_VE.pdf){#fig:MI_VE}
In conclusion, between a plain Monte Carlo technique, stratified sampling and
importance sampling, the last one turned out to be the most powerful mean to
obtain a good estimation of the integrand.

View File

@ -1,5 +1,3 @@
- cambiare simbolo convoluzione - cambiare simbolo convoluzione
- aggiungere citazioni e referenze - aggiungere citazioni e referenze
- rifare grafici senza bordino
- leggere l'articolo di Lucy - leggere l'articolo di Lucy
- riscrivere il readme del 5