ex-5: revised and typo-fixed
Some images and an article have also been added to the folder. The todo has been updated too as well as the readme.
This commit is contained in:
parent
9303f4f379
commit
bdc12117a7
45
README.md
45
README.md
@ -42,15 +42,15 @@ See this [guide](https://nixos.org/nix/manual/#chap-quick-start) if you have
|
|||||||
never used Nix before. Running `nix-shell` in the top-level will drop you into
|
never used Nix before. Running `nix-shell` in the top-level will drop you into
|
||||||
the development shell.
|
the development shell.
|
||||||
|
|
||||||
Once ready, invoke `make` with the program you wish to build. For example
|
Once ready, invoke `make` with the program you wish to build. For example:
|
||||||
|
|
||||||
$ make ex-1/bin/main
|
$ make ex-1/bin/main
|
||||||
|
|
||||||
or, to build every program of an exercise
|
or, to build every program of an exercise:
|
||||||
|
|
||||||
$ make ex-1
|
$ make ex-1
|
||||||
|
|
||||||
To clean up the build results run
|
To clean up the build results run:
|
||||||
|
|
||||||
$ make clean
|
$ make clean
|
||||||
|
|
||||||
@ -82,9 +82,9 @@ The output can be redirected to `ex-1/pdf-plot.py` to generate a plot.
|
|||||||
Every program in `ex-2` computes the best available approximation (with a given
|
Every program in `ex-2` computes the best available approximation (with a given
|
||||||
method) to the Euler-Mascheroni γ constant and prints[1]:
|
method) to the Euler-Mascheroni γ constant and prints[1]:
|
||||||
|
|
||||||
1. the leading decimal digits of the approximate value found
|
1. the leading decimal digits of the approximate value found;
|
||||||
|
|
||||||
2. the exact decimal digits of γ
|
2. the exact decimal digits of γ;
|
||||||
|
|
||||||
3. the absolute difference between the 1. and 2.
|
3. the absolute difference between the 1. and 2.
|
||||||
|
|
||||||
@ -104,7 +104,7 @@ cases the best fit and the parameter covariance matrix are printed.
|
|||||||
The program then performs a t-test to assert the compatibility of the data with
|
The program then performs a t-test to assert the compatibility of the data with
|
||||||
two hypothesis and print the results in a table.
|
two hypothesis and print the results in a table.
|
||||||
|
|
||||||
To plot a 2D histogram of the generated sample do
|
To plot a 2D histogram of the generated sample do:
|
||||||
|
|
||||||
$ ex-3/bin/main -i | ex-3/plot.py
|
$ ex-3/bin/main -i | ex-3/plot.py
|
||||||
|
|
||||||
@ -123,7 +123,7 @@ horizontal component. It is possible to set the maximum momentum with the
|
|||||||
option `-p`. A χ² fit and a t-test compatibility are performed with respect
|
option `-p`. A χ² fit and a t-test compatibility are performed with respect
|
||||||
to the expected distribution and results are printed.
|
to the expected distribution and results are printed.
|
||||||
|
|
||||||
To plot a histogram of the generated sample do
|
To plot a histogram of the generated sample do:
|
||||||
|
|
||||||
$ ex-4/bin/main -o | ex-4/plot.py
|
$ ex-4/bin/main -o | ex-4/plot.py
|
||||||
|
|
||||||
@ -133,11 +133,30 @@ and `-b`.
|
|||||||
|
|
||||||
### Exercise 5
|
### Exercise 5
|
||||||
|
|
||||||
`ex-5/main` compute the integral of exp(1) between 0 and 1 with the methods
|
`ex-5/main` compute estimations of the integral of exp(1) between 0 and 1
|
||||||
plain MC, MISER and VEGAS. Being reiterative routines, it takes the number of
|
with the methods plain MC, MISER and VEGAS with different number of points.
|
||||||
iterations as its only argument.
|
It takes no arguments in input.
|
||||||
It prints out the obatined value and its estimated error for each method.
|
It prints out eight columns:
|
||||||
|
|
||||||
|
1. the number of sorted points;
|
||||||
|
|
||||||
|
2. the integral estimation with plain MC;
|
||||||
|
|
||||||
|
3. its uncertainty;
|
||||||
|
|
||||||
|
4. the integral estimation with MISER;
|
||||||
|
|
||||||
|
5. its uncertainty;
|
||||||
|
|
||||||
|
6. the integral estimation with VEGAS;
|
||||||
|
|
||||||
|
7. its uncertainty;
|
||||||
|
|
||||||
|
8. its χ².
|
||||||
|
|
||||||
|
To plot the results do:
|
||||||
|
|
||||||
|
$ ex-5/main | ex-5/plot.py
|
||||||
|
|
||||||
### Exercise 6
|
### Exercise 6
|
||||||
|
|
||||||
@ -167,8 +186,8 @@ classified data in this order: signal then noise.
|
|||||||
|
|
||||||
To plot the result of the linear classification pipe the output to
|
To plot the result of the linear classification pipe the output to
|
||||||
`ex-7/plot.py`. The program generates two figures:
|
`ex-7/plot.py`. The program generates two figures:
|
||||||
- a scatter plot showing the Fisher projection line and the cut line
|
- a scatter plot showing the Fisher projection line and the cut line;
|
||||||
- two histograms of the projected data and the cut line
|
- two histograms of the projected data and the cut line.
|
||||||
|
|
||||||
`ex-7/bin/test` takes a model trained in `ex-7/bin/main` and test it against
|
`ex-7/bin/test` takes a model trained in `ex-7/bin/main` and test it against
|
||||||
newly generated datasets (`-i` to set the number of test iterations). The
|
newly generated datasets (`-i` to set the number of test iterations). The
|
||||||
|
29
ex-5/main.c
29
ex-5/main.c
@ -15,14 +15,18 @@ double function (double * x, size_t dim, void * params)
|
|||||||
|
|
||||||
// Function which prints out the results when called.
|
// Function which prints out the results when called.
|
||||||
//
|
//
|
||||||
void results (size_t calls, double result, double error)
|
void results (size_t calls, double result, double error, double chi)
|
||||||
{
|
{
|
||||||
if (calls != 0)
|
if (calls != 0)
|
||||||
{
|
{
|
||||||
printf ("%ld\t", calls);
|
printf ("%.0e\t", (double)calls);
|
||||||
}
|
}
|
||||||
printf ("%.10f\t", result);
|
printf ("%.10f\t", result);
|
||||||
printf ("%.10f\t", error);
|
printf ("%.10f\t", error);
|
||||||
|
if (chi != 0)
|
||||||
|
{
|
||||||
|
printf ("%.3f\t", chi);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@ -33,7 +37,7 @@ int main(int argc, char** argv)
|
|||||||
size_t dims = 1; // Integral dimension
|
size_t dims = 1; // Integral dimension
|
||||||
double lower[1] = {0}; // Integration lower limit
|
double lower[1] = {0}; // Integration lower limit
|
||||||
double upper[1] = {1}; // Integration upper limit
|
double upper[1] = {1}; // Integration upper limit
|
||||||
size_t calls = 5000; // Initial number of function calls
|
size_t calls = 50; // Initial number of function calls
|
||||||
double integral, error;
|
double integral, error;
|
||||||
|
|
||||||
// Initialize an RNG.
|
// Initialize an RNG.
|
||||||
@ -65,26 +69,37 @@ int main(int argc, char** argv)
|
|||||||
gsl_monte_plain_integrate (&expo, lower, upper, dims, calls,
|
gsl_monte_plain_integrate (&expo, lower, upper, dims, calls,
|
||||||
r, sMC, &integral, &error);
|
r, sMC, &integral, &error);
|
||||||
gsl_monte_plain_free(sMC);
|
gsl_monte_plain_free(sMC);
|
||||||
results(calls, integral, error);
|
results(calls, integral, error, 0);
|
||||||
|
|
||||||
// MISER
|
// MISER
|
||||||
gsl_monte_miser_state *sMI = gsl_monte_miser_alloc (dims);
|
gsl_monte_miser_state *sMI = gsl_monte_miser_alloc (dims);
|
||||||
gsl_monte_miser_integrate (&expo, lower, upper, dims, calls,
|
gsl_monte_miser_integrate (&expo, lower, upper, dims, calls,
|
||||||
r, sMI, &integral, &error);
|
r, sMI, &integral, &error);
|
||||||
gsl_monte_miser_free(sMI);
|
gsl_monte_miser_free(sMI);
|
||||||
results(0, integral, error);
|
results(0, integral, error, 0);
|
||||||
|
|
||||||
// VEGAS
|
// VEGAS
|
||||||
gsl_monte_vegas_state *sVE = gsl_monte_vegas_alloc (dims);
|
gsl_monte_vegas_state *sVE = gsl_monte_vegas_alloc (dims);
|
||||||
gsl_monte_vegas_integrate (&expo, lower, upper, dims, calls,
|
gsl_monte_vegas_integrate (&expo, lower, upper, dims, calls,
|
||||||
r, sVE, &integral, &error);
|
r, sVE, &integral, &error);
|
||||||
|
|
||||||
|
// In order to see the parameters of the VEGAS integration, decomment this
|
||||||
|
// paragraph:
|
||||||
|
//
|
||||||
|
// gsl_monte_vegas_params params;
|
||||||
|
// gsl_monte_vegas_params_get(sVE, ¶ms);
|
||||||
|
// params.verbose = 1;
|
||||||
|
// params.ostream = stderr;
|
||||||
|
// gsl_monte_vegas_params_set(sVE, ¶ms);
|
||||||
do
|
do
|
||||||
{
|
{
|
||||||
gsl_monte_vegas_integrate (&expo, lower, upper, dims,
|
gsl_monte_vegas_integrate (&expo, lower, upper, dims,
|
||||||
calls/5, r, sVE, &integral, &error);
|
calls, r, sVE, &integral, &error);
|
||||||
} while (fabs (gsl_monte_vegas_chisq (sVE) - 1.0) > 0.5);
|
} while (fabs (gsl_monte_vegas_chisq (sVE) - 1.0) > 0.5);
|
||||||
gsl_monte_vegas_free(sVE);
|
gsl_monte_vegas_free(sVE);
|
||||||
results(0, integral, error);
|
double chi = sVE->chisq;
|
||||||
|
results(0, integral, error, chi);
|
||||||
|
|
||||||
|
|
||||||
// Update function calls
|
// Update function calls
|
||||||
printf ("\n");
|
printf ("\n");
|
||||||
|
4
ex-5/plot.py
Normal file → Executable file
4
ex-5/plot.py
Normal file → Executable file
@ -1,8 +1,10 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
|
||||||
import matplotlib.pyplot as plt
|
import matplotlib.pyplot as plt
|
||||||
import numpy as np
|
import numpy as np
|
||||||
import sys
|
import sys
|
||||||
|
|
||||||
calls, I_MC, σ_MC, I_MI, σ_MI, I_VE, σ_VE = np.loadtxt(sys.stdin, unpack=True)
|
calls, I_MC, σ_MC, I_MI, σ_MI, I_VE, σ_VE, chi = np.loadtxt(sys.stdin, unpack=True)
|
||||||
exact = 1.7182818285
|
exact = 1.7182818285
|
||||||
|
|
||||||
plt.rcParams['font.size'] = 17
|
plt.rcParams['font.size'] = 17
|
||||||
|
@ -106,3 +106,35 @@
|
|||||||
year={2005},
|
year={2005},
|
||||||
journal={Matrix}
|
journal={Matrix}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@article{sayah19,
|
||||||
|
title={Adaptive stratified Monte Carlo algorithm for numerical computation
|
||||||
|
of integrals},
|
||||||
|
author={Sayah, Toni},
|
||||||
|
journal={Mathematics and Computers in Simulation},
|
||||||
|
volume={157},
|
||||||
|
pages={143--158},
|
||||||
|
year={2019},
|
||||||
|
publisher={Elsevier}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@article{ridder17,
|
||||||
|
title={Variance reduction},
|
||||||
|
author={Ridder, AAN and Botev, ZI},
|
||||||
|
journal={Wiley StatsRef: Statistics Reference Online},
|
||||||
|
pages={1--6},
|
||||||
|
year={2017},
|
||||||
|
publisher={Wiley}
|
||||||
|
}
|
||||||
|
|
||||||
|
@article{lepage78,
|
||||||
|
title={A new algorithm for adaptive multidimensional integration},
|
||||||
|
author={Lepage, G Peter},
|
||||||
|
journal={Journal of Computational Physics},
|
||||||
|
volume={27},
|
||||||
|
number={2},
|
||||||
|
pages={192--203},
|
||||||
|
year={1978},
|
||||||
|
publisher={Elsevier}
|
||||||
|
}
|
||||||
|
Binary file not shown.
Binary file not shown.
@ -209,7 +209,8 @@ $$
|
|||||||
P^{\text{oss}}_{\text{max}} = 10.005 \pm 0.018 \with \chi_r^2 = 0.071
|
P^{\text{oss}}_{\text{max}} = 10.005 \pm 0.018 \with \chi_r^2 = 0.071
|
||||||
$$
|
$$
|
||||||
|
|
||||||
where $\chi_r^2$ is the reduced $\chi^2$, proving a good convergence.
|
where $\chi_r^2$ is the $\chi^2$ per degree of freedom, proving a good
|
||||||
|
convergence.
|
||||||
In order to compare $P^{\text{oss}}_{\text{max}}$ with the expected value
|
In order to compare $P^{\text{oss}}_{\text{max}}$ with the expected value
|
||||||
$P_{\text{max}} = 10$, the following compatibility t-test was applied:
|
$P_{\text{max}} = 10$, the following compatibility t-test was applied:
|
||||||
|
|
||||||
|
@ -1,10 +1,7 @@
|
|||||||
# Exercize 5
|
# Exercise 5
|
||||||
|
|
||||||
The following integral must be evaluated:
|
The following integral must be evaluated comparing different Monte Carlo
|
||||||
|
techniques.
|
||||||
$$
|
|
||||||
I = \int\limits_0^1 dx \, e^x
|
|
||||||
$$
|
|
||||||
|
|
||||||
\begin{figure}
|
\begin{figure}
|
||||||
\hypertarget{fig:exp}{%
|
\hypertarget{fig:exp}{%
|
||||||
@ -24,6 +21,8 @@ $$
|
|||||||
% Plot
|
% Plot
|
||||||
\draw [domain=-2:7, smooth, variable=\x,
|
\draw [domain=-2:7, smooth, variable=\x,
|
||||||
cyclamen, ultra thick] plot ({\x},{exp(\x/5)});
|
cyclamen, ultra thick] plot ({\x},{exp(\x/5)});
|
||||||
|
% Equation
|
||||||
|
\node [above] at (2.5, 2.5) {$I = \int\limits_0^1 dx \, e^x$};
|
||||||
\end{tikzpicture}
|
\end{tikzpicture}
|
||||||
\caption{Plot of the integral to be evaluated.}
|
\caption{Plot of the integral to be evaluated.}
|
||||||
}
|
}
|
||||||
@ -33,14 +32,14 @@ whose exact value is 1.7182818285...
|
|||||||
|
|
||||||
The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser
|
The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser
|
||||||
and Vegas. Besides this popularity fact, these three method were chosen for
|
and Vegas. Besides this popularity fact, these three method were chosen for
|
||||||
being the only ones implemented in the GSL library.
|
being implemented in the GSL libraries `gsl_monte_plain`, `gsl_monte_miser` and
|
||||||
|
`gsl_monte_vegas` respectively.
|
||||||
|
|
||||||
|
|
||||||
## Plain Monte Carlo
|
## Plain Monte Carlo
|
||||||
|
|
||||||
When the integral $I$ over a $n-$dimensional space $\Omega$ of volume $V$ of a
|
When an integral $I$ over a $n-$dimensional space $\Omega$ of volume $V$ of a
|
||||||
function $f$ must be evaluated, that is:
|
function $f$ must be evaluated, that is:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
I = \int\limits_{\Omega} dx \, f(x)
|
I = \int\limits_{\Omega} dx \, f(x)
|
||||||
\with V = \int\limits_{\Omega} dx
|
\with V = \int\limits_{\Omega} dx
|
||||||
@ -48,14 +47,12 @@ $$
|
|||||||
|
|
||||||
the simplest MC method approach is to sample $N$ points $x_i$ evenly distributed
|
the simplest MC method approach is to sample $N$ points $x_i$ evenly distributed
|
||||||
in $V$ and approx $I$ as:
|
in $V$ and approx $I$ as:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle
|
I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle
|
||||||
$$
|
$$
|
||||||
|
|
||||||
with $I_N \rightarrow I$ for $N \rightarrow + \infty$ for the law of large
|
with $I_N \rightarrow I$ for $N \rightarrow + \infty$ because of the law of
|
||||||
numbers. Hence, the sample variance can be extimated by the sample variance:
|
large numbers, whereas the sample variance can be estimated as:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\sigma^2_f = \frac{1}{N - 1} \sum_{i = 1}^N \left( f(x_i) - \langle f
|
\sigma^2_f = \frac{1}{N - 1} \sum_{i = 1}^N \left( f(x_i) - \langle f
|
||||||
\rangle \right)^2 \et \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N
|
\rangle \right)^2 \et \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N
|
||||||
@ -67,31 +64,55 @@ Unlike in deterministic methods, the estimate of the error is not a strict error
|
|||||||
bound: random sampling may not uncover all the important features of the
|
bound: random sampling may not uncover all the important features of the
|
||||||
integrand and this can result in an underestimate of the error.
|
integrand and this can result in an underestimate of the error.
|
||||||
|
|
||||||
In this case, $f(x) = e^{x}$ and $\Omega = [0,1]$.
|
In this case $f(x) = e^{x}$ and $\Omega = [0,1]$, hence $V = 1$.
|
||||||
|
|
||||||
Since the proximity of $I_N$ to $I$ is related to $N$, the accuracy of the
|
Since the proximity of $I_N$ to $I$ is related to $N$, the accuracy of the
|
||||||
method is determined by how many points are generated, namely how many function
|
method lies in how many points are generated, namely how many function calls
|
||||||
calls are exectuted when the method is implemented. In @tbl:MC, the obtained
|
are executed when the reiterative method is implemented. In @fig:MC and
|
||||||
results and errors $\sigma$ are shown. The estimated integrals for different
|
@fig:MI, results obtained with the plain MC method are shown in red. In
|
||||||
numbers of calls are compared to the expected value $I$ and the difference
|
@tbl:MC, some of them are listed: the estimated integrals $I^{\text{oss}}$ are
|
||||||
'diff' between them is given.
|
compared to the expected value $I$ and the differences diff between them are
|
||||||
As can be seen, the MC method tends to underestimate the error for scarse
|
given.
|
||||||
function calls. As previously stated, the higher the number of function calls,
|
|
||||||
the better the estimation of $I$. A further observation regards the fact that,
|
|
||||||
even with $50'000'000$ calls, the $I^{\text{oss}}$ still differs from $I$ at
|
|
||||||
the fifth decimal digit.
|
|
||||||
|
|
||||||
-------------------------------------------------------------------------
|
![Estimated values of $I$ obatined by Plain MC technique with different
|
||||||
500'000 calls 5'000'000 calls 50'000'000 calls
|
number of function calls; logarithmic scale; errorbars showing their
|
||||||
----------------- ----------------- ------------------ ------------------
|
estimated uncertainties. As can be seen, the process does a sort o seesaw
|
||||||
$I^{\text{oss}}$ 1.7166435813 1.7181231109 1.7183387184
|
around the correct value.](images/MC_MC.pdf){#fig:MC}
|
||||||
|
|
||||||
$\sigma$ 0.0006955691 0.0002200309 0.0000695809
|
---------------------------------------------------------------------------
|
||||||
|
calls $I^{\text{oss}}$ $\sigma$ diff
|
||||||
|
------------------ ------------------ ------------------ ------------------
|
||||||
|
\SI{5e5}{} 1.7166435813 0.0006955691 0.0016382472
|
||||||
|
|
||||||
diff 0.0016382472 0.0001587176 0.0000568899
|
\SI{5e6}{} 1.7181231109 0.0002200309 0.0001587176
|
||||||
-------------------------------------------------------------------------
|
|
||||||
|
|
||||||
Table: MC results with different numbers of function calls. {#tbl:MC}
|
\SI{5e7}{} 1.7183387184 0.0000695809 0.0000568899
|
||||||
|
---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
Table: Some MC results with three different numbers of function calls.
|
||||||
|
Differences between computed and exact values are given in
|
||||||
|
diff. {#tbl:MC}
|
||||||
|
|
||||||
|
As can be seen, $\sigma$ is always of the same order of magnitude of diff,
|
||||||
|
except for very low numbers of function calls. Even with \SI{5e7}{} calls,
|
||||||
|
$I^{\text{oss}}$ still differs from $I$ at the fifth decimal digit, meaning that
|
||||||
|
this method shows a really slow convergence. In fact, since the $\sigma$s
|
||||||
|
dependence on the number $C$ of function calls is confirmed:
|
||||||
|
$$
|
||||||
|
\begin{cases}
|
||||||
|
\sigma_1 = \SI{6.95569e-4}{} \longleftrightarrow C_1 = \SI{5e6}{} \\
|
||||||
|
\sigma_2 = \SI{6.95809e-5}{} \longleftrightarrow C_1 = \SI{5e8}{}
|
||||||
|
\end{cases}
|
||||||
|
$$
|
||||||
|
|
||||||
|
$$
|
||||||
|
\thus
|
||||||
|
\frac{\sigma_1}{\sigma_2} = 9.9965508 \sim 10 = \sqrt{\frac{C_2}{C_1}}
|
||||||
|
$$
|
||||||
|
|
||||||
|
if an error of $\sim 1^{-n}$ is required, a number $\propto 10^{2n}$ of
|
||||||
|
function calls must be executed, meaning that for $\sigma \sim 1^{-10}
|
||||||
|
\rightarrow C = \SI{1e20}{}$, which would be impractical.
|
||||||
|
|
||||||
|
|
||||||
## Stratified sampling
|
## Stratified sampling
|
||||||
@ -101,12 +122,12 @@ partitioned into subpopulations. Stratification, indeed, is the process of
|
|||||||
dividing the primary sample into subgroups (strata) before sampling random
|
dividing the primary sample into subgroups (strata) before sampling random
|
||||||
within each stratum.
|
within each stratum.
|
||||||
Given the mean $\bar{x}_i$ and variance ${\sigma^2_x}_i$ of an entity $x$
|
Given the mean $\bar{x}_i$ and variance ${\sigma^2_x}_i$ of an entity $x$
|
||||||
sorted with simple random sampling in each strata, such as:
|
sorted with simple random sampling in the $i^{\text{th}}$ strata, such as:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\bar{x}_i = \frac{1}{n_i} \sum_j x_j
|
\bar{x}_i = \frac{1}{n_i} \sum_j x_j
|
||||||
$$
|
$$
|
||||||
|
|
||||||
|
and:
|
||||||
$$
|
$$
|
||||||
\sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( x_j - \bar{x}_i \right)^2
|
\sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( x_j - \bar{x}_i \right)^2
|
||||||
\thus
|
\thus
|
||||||
@ -115,13 +136,12 @@ $$
|
|||||||
|
|
||||||
where:
|
where:
|
||||||
|
|
||||||
- $j$ runs over the points $x_j$ sampled in the $i^{\text{th}}$ stratum
|
- $j$ runs over the points $x_j$ sampled in the $i^{\text{th}}$ stratum,
|
||||||
- $n_i$ is the number of points sorted in it
|
- $n_i$ is the number of points sorted in it,
|
||||||
- $\sigma_i^2$ is the variance associated with the $j^{\text{th}}$ point
|
- $\sigma_i^2$ is the variance associated with the $j^{\text{th}}$ point.
|
||||||
|
|
||||||
then the mean $\bar{x}$ and variance $\sigma_x^2$ estimated with stratified
|
then, the mean $\bar{x}$ and variance $\sigma_x^2$ estimated with stratified
|
||||||
sampling for the whole population are:
|
sampling for the whole population are:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\bar{x} = \frac{1}{N} \sum_i N_i \bar{x}_i \et
|
\bar{x} = \frac{1}{N} \sum_i N_i \bar{x}_i \et
|
||||||
\sigma_x^2 = \sum_i \left( \frac{N_i}{N} \right)^2 {\sigma_x}^2_i
|
\sigma_x^2 = \sum_i \left( \frac{N_i}{N} \right)^2 {\sigma_x}^2_i
|
||||||
@ -138,7 +158,7 @@ result will have a smaller error in estimation with respect to the one otherwise
|
|||||||
obtained with simple sampling.
|
obtained with simple sampling.
|
||||||
For this reason, stratified sampling is used as a method of variance reduction
|
For this reason, stratified sampling is used as a method of variance reduction
|
||||||
when MC methods are used to estimate population statistics from a known
|
when MC methods are used to estimate population statistics from a known
|
||||||
population.
|
population [@ridder17].
|
||||||
|
|
||||||
|
|
||||||
### MISER
|
### MISER
|
||||||
@ -147,36 +167,32 @@ The MISER technique aims to reduce the integration error through the use of
|
|||||||
recursive stratified sampling.
|
recursive stratified sampling.
|
||||||
As stated before, according to the law of large numbers, for a large number of
|
As stated before, according to the law of large numbers, for a large number of
|
||||||
extracted points, the estimation of the integral $I$ can be computed as:
|
extracted points, the estimation of the integral $I$ can be computed as:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
I= V \cdot \langle f \rangle
|
I= V \cdot \langle f \rangle
|
||||||
$$
|
$$
|
||||||
|
|
||||||
|
|
||||||
Since $V$ is known (in this case, $V = 1$), it is sufficient to estimate
|
Since $V$ is known (in this case, $V = 1$), it is sufficient to estimate
|
||||||
$\langle f \rangle$.
|
$\langle f \rangle$.
|
||||||
|
|
||||||
Consider two disjoint regions $a$ and $b$, such that $a \cup b = \Omega$, in
|
Consider two disjoint regions $a$ and $b$, such that $a \cup b = \Omega$, in
|
||||||
which $n_a$ and $n_b$ points were uniformely sampled. Given the Monte Carlo
|
which $n_a$ and $n_b$ points are respectively uniformly sampled. Given the
|
||||||
estimates of the means $\langle f \rangle_a$ and $\langle f \rangle_b$ of those
|
Monte Carlo estimates of the means $\langle f \rangle_a$ and $\langle f
|
||||||
points and their variances $\sigma_a^2$ and $\sigma_b^2$, if the weights $N_a$
|
\rangle_b$ of those points and their variances $\sigma_a^2$ and $\sigma_b^2$, if
|
||||||
and $N_b$ of $\langle f \rangle_a$ and $\langle f \rangle_b$ are chosen unitary,
|
the weights $N_a$ and $N_b$ of $\langle f \rangle_a$ and $\langle f \rangle_b$
|
||||||
then the variance $\sigma^2$ of the combined estimate $\langle f \rangle$:
|
are chosen unitary, then the variance $\sigma^2$ of the combined estimate
|
||||||
|
$\langle f \rangle$:
|
||||||
$$
|
$$
|
||||||
\langle f \rangle = \frac{1}{2} \left( \langle f \rangle_a
|
\langle f \rangle = \frac{1}{2} \left( \langle f \rangle_a
|
||||||
+ \langle f \rangle_b \right)
|
+ \langle f \rangle_b \right)
|
||||||
$$
|
$$
|
||||||
|
|
||||||
is given by:
|
is given by:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\sigma^2 = \frac{\sigma_a^2}{4n_a} + \frac{\sigma_b^2}{4n_b}
|
\sigma^2 = \frac{\sigma_a^2}{4n_a} + \frac{\sigma_b^2}{4n_b}
|
||||||
$$
|
$$
|
||||||
|
|
||||||
It can be shown that this variance is minimized by distributing the points such
|
It can be shown that this variance is minimized by distributing the points such
|
||||||
that:
|
that:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\frac{n_a}{n_a + n_b} = \frac{\sigma_a}{\sigma_a + \sigma_b}
|
\frac{n_a}{n_a + n_b} = \frac{\sigma_a}{\sigma_a + \sigma_b}
|
||||||
$$
|
$$
|
||||||
@ -184,43 +200,48 @@ $$
|
|||||||
Hence, the smallest error estimate is obtained by allocating sample points in
|
Hence, the smallest error estimate is obtained by allocating sample points in
|
||||||
proportion to the standard deviation of the function in each sub-region.
|
proportion to the standard deviation of the function in each sub-region.
|
||||||
The whole integral estimate and its variance are therefore given by:
|
The whole integral estimate and its variance are therefore given by:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
I = V \cdot \langle f \rangle \et \sigma_I^2 = V^2 \cdot \sigma^2
|
I = V \cdot \langle f \rangle \et \sigma_I^2 = V^2 \cdot \sigma^2
|
||||||
$$
|
$$
|
||||||
|
|
||||||
When implemented, MISER is in fact a recursive method. With a given step, all
|
When implemented, MISER is in fact a recursive method. First, all the possible
|
||||||
the possible bisections are tested and the one which minimizes the combined
|
bisections of $\Omega$ are tested and the one which minimizes the combined
|
||||||
variance of the two sub-regions is selected. The variance in the sub-regions is
|
variance of the two sub-regions is selected. In order to speed up the
|
||||||
estimated with a fraction of the total number of available points. The remaining
|
algorithm, the variance in the sub-regions is estimated with a fraction of the
|
||||||
sample points are allocated to the sub-regions using the formula for $n_a$ and
|
total number of available points (function calls) which is default set to 0.1.
|
||||||
$n_b$, once the variances are computed.
|
The remaining points are allocated to the sub-regions using the formula for
|
||||||
|
$n_a$ and $n_b$, once the variances are computed.
|
||||||
The same procedure is then repeated recursively for each of the two half-spaces
|
The same procedure is then repeated recursively for each of the two half-spaces
|
||||||
from the best bisection. At each recursion step, the integral and the error are
|
from the best bisection. When less than 32*16 (default option) function calls
|
||||||
estimated using a plain Monte Carlo algorithm. After a given number of calls,
|
are available in a single section, the integral and the error in this section
|
||||||
the final individual values and their error estimates are then combined upwards
|
are estimated using the plain Monte Carlo algorithm.
|
||||||
to give an overall result and an estimate of its error.
|
The final individual values and their error estimates are then combined upwards
|
||||||
|
to give an overall result and an estimate of its error [@sayah19].
|
||||||
|
|
||||||
Results for this particular sample are shown in @tbl:MISER.
|
![Estimations $I^{\text{oss}}$ of the integral $I$ obtained for the three
|
||||||
|
implemented method for different values of function calls. Errorbars
|
||||||
|
showing their estimated uncertainties.](images/MC_MC_MI.pdf){#fig:MI}
|
||||||
|
|
||||||
-------------------------------------------------------------------------
|
Results for this particular sample are shown in black in @fig:MI and some of
|
||||||
500'000 calls 5'000'000 calls 50'000'000 calls
|
them are listed in @tbl:MI. Except for the first very little number of calls,
|
||||||
----------------- ----------------- ------------------ ------------------
|
the improvement with respect to the Plain MC technique (in red) is appreciable.
|
||||||
$I^{\text{oss}}$ 1.7182850738 1.7182819143 1.7182818221
|
|
||||||
|
|
||||||
$\sigma$ 0.0000021829 0.0000001024 0.0000000049
|
---------------------------------------------------------------------------
|
||||||
|
calls $I^{\text{oss}}$ $\sigma$ diff
|
||||||
|
------------------ ------------------ ------------------ ------------------
|
||||||
|
\SI{5e5}{} 1.7182850738 0.0000021829 0.0000032453
|
||||||
|
|
||||||
diff 0.0000032453 0.0000000858 000000000064
|
\SI{5e6}{} 1.7182819143 0.0000001024 0.0000000858
|
||||||
-------------------------------------------------------------------------
|
|
||||||
|
|
||||||
Table: MISER results with different numbers of function calls. Be careful:
|
\SI{5e7}{} 1.7182818221 0.0000000049 0.0000000064
|
||||||
while in @tbl:MC the number of function calls stands for the number of
|
---------------------------------------------------------------------------
|
||||||
total sampled poins, in this case it stands for the times each section
|
|
||||||
is divided into subsections. {#tbl:MISER}
|
|
||||||
|
|
||||||
This time the error, altough it lies always in the same order of magnitude of
|
Table: MISER results with different numbers of function calls. Differences
|
||||||
diff, seems to seesaw around the correct value, which is much more closer to
|
between computed and exact values are given in diff. {#tbl:MI}
|
||||||
the expected one.
|
|
||||||
|
This time the convergence is much faster: already with 500'000 number of points,
|
||||||
|
the correct results differs from the computed one at the fifth decimal digit.
|
||||||
|
Once again, the error lies always in the same order of magnitude of diff.
|
||||||
|
|
||||||
|
|
||||||
## Importance sampling
|
## Importance sampling
|
||||||
@ -230,16 +251,14 @@ probability distribution $f$ itself, so that the points cluster in the regions
|
|||||||
that make the largest contribution to the integral.
|
that make the largest contribution to the integral.
|
||||||
|
|
||||||
Remind that $I = V \cdot \langle f \rangle$ and therefore only $\langle f
|
Remind that $I = V \cdot \langle f \rangle$ and therefore only $\langle f
|
||||||
\rangle$ must be estimated. Then, consider a sample of $n$ points {$x_i$}
|
\rangle$ must be estimated. Consider a sample of $n$ points {$x_i$} generated
|
||||||
generated according to a probability distribition function $P$ which gives
|
according to a probability distribution function $P$ which gives thereby the
|
||||||
thereby the following expected value:
|
following expected value:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
E [x, P] = \frac{1}{n} \sum_i x_i
|
E [x, P] = \frac{1}{n} \sum_i x_i
|
||||||
$$
|
$$
|
||||||
|
|
||||||
with variance:
|
with variance:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\sigma^2 [E, P] = \frac{\sigma^2 [x, P]}{n}
|
\sigma^2 [E, P] = \frac{\sigma^2 [x, P]}{n}
|
||||||
\with \sigma^2 [x, P] = \frac{1}{n -1} \sum_i \left( x_i - E [x, P] \right)^2
|
\with \sigma^2 [x, P] = \frac{1}{n -1} \sum_i \left( x_i - E [x, P] \right)^2
|
||||||
@ -252,13 +271,11 @@ are evenly distributed in $\Omega$. The idea is to sample points from a
|
|||||||
different distribution to lower the variance of $E[x, P]$, which results in
|
different distribution to lower the variance of $E[x, P]$, which results in
|
||||||
lowering $\sigma^2 [x, P]$. This is accomplished by choosing a random variable
|
lowering $\sigma^2 [x, P]$. This is accomplished by choosing a random variable
|
||||||
$y$ and defining a new probability $P^{(y)}$ in order to satisfy:
|
$y$ and defining a new probability $P^{(y)}$ in order to satisfy:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
E [x, P] = E \left[ \frac{x}{y}, P^{(y)} \right]
|
E [x, P] = E \left[ \frac{x}{y}, P^{(y)} \right]
|
||||||
$$
|
$$
|
||||||
|
|
||||||
which is to say:
|
which is to say:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
I = \int \limits_{\Omega} dx f(x) =
|
I = \int \limits_{\Omega} dx f(x) =
|
||||||
\int \limits_{\Omega} dx \, \frac{f(x)}{g(x)} \, g(x)=
|
\int \limits_{\Omega} dx \, \frac{f(x)}{g(x)} \, g(x)=
|
||||||
@ -266,7 +283,6 @@ $$
|
|||||||
$$
|
$$
|
||||||
|
|
||||||
where $E \, \longleftrightarrow \, I$ and:
|
where $E \, \longleftrightarrow \, I$ and:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\begin{cases}
|
\begin{cases}
|
||||||
f(x) \, \longleftrightarrow \, x \\
|
f(x) \, \longleftrightarrow \, x \\
|
||||||
@ -274,42 +290,32 @@ $$
|
|||||||
\end{cases}
|
\end{cases}
|
||||||
\et
|
\et
|
||||||
\begin{cases}
|
\begin{cases}
|
||||||
w(x) \, \longleftrightarrow \, \frac{x}{y} \\
|
g(x) \, \longleftrightarrow \, y = P^{(y)} \\
|
||||||
g(x) \, \longleftrightarrow \, y = P^{(y)}
|
w(x) = \frac{f(x)}{g(x)} \, \longleftrightarrow \, \frac{x}{y}
|
||||||
\end{cases}
|
\end{cases}
|
||||||
$$
|
$$
|
||||||
|
|
||||||
Where the symbol $\longleftrightarrow$ points out the connection between the
|
Where the symbol $\longleftrightarrow$ points out the connection between the
|
||||||
variables. This new estimate is better than the former if:
|
variables. The best variable $y$ would be:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\sigma^2 \left[ \frac{x}{y}, P^{(y)} \right] < \sigma^2 [x, P]
|
y = \frac{x}{E [x, P]} \, \longleftrightarrow \, \frac{f(x)}{I}
|
||||||
|
\thus \frac{x}{y} = E [x, P]
|
||||||
$$
|
$$
|
||||||
|
|
||||||
The best variable $y$ would be:
|
and even a single sample under $P^{(y)}$ would be sufficient to give its value:
|
||||||
|
it can be shown that under this probability distribution the variance vanishes.
|
||||||
$$
|
Obviously, it is not possible to take exactly this choice, since $E [x, P]$ is
|
||||||
y^{\star} = \frac{x}{E [x, P]} \, \longleftrightarrow \, \frac{f(x)}{I}
|
not given a priori. However, this gives an insight into what importance sampling
|
||||||
\thus \frac{x}{y^{\star}} = E [x, P]
|
does. In fact, given that:
|
||||||
$$
|
|
||||||
|
|
||||||
and even a single sample under $P^{(y^{\star})}$ would be sufficient to give its
|
|
||||||
value. Obviously, it is not possible to take exactly this choice, since $E [x,
|
|
||||||
P]$ is not given a priori.
|
|
||||||
However, this gives an insight into what importance sampling does. In fact,
|
|
||||||
given that:
|
|
||||||
|
|
||||||
$$
|
$$
|
||||||
E [x, P] = \int \limits_{a = - \infty}^{a = + \infty}
|
E [x, P] = \int \limits_{a = - \infty}^{a = + \infty}
|
||||||
a P(x \in [a, a + da])
|
da \, a P(x \in [a, a + da])
|
||||||
$$
|
$$
|
||||||
|
|
||||||
the best probability change $P^{(y^{\star})}$ redistributes the law of $x$ so
|
the best probability $P^{(y)}$ redistributes the law of $x$ so that its samples
|
||||||
that its samples frequencies are sorted directly according to their weights in
|
frequencies are sorted directly according to their weights in $E[x, P]$, namely:
|
||||||
$E[x, P]$, namely:
|
|
||||||
|
|
||||||
$$
|
$$
|
||||||
P^{(y^{\star})}(x \in [a, a + da]) = \frac{1}{E [x, P]} a P (x \in [a, a + da])
|
P^{(y}(x \in [a, a + da]) = \frac{1}{E [x, P]} a P (x \in [a, a + da])
|
||||||
$$
|
$$
|
||||||
|
|
||||||
In conclusion, since certain values of $x$ have more impact on $E [x, P]$ than
|
In conclusion, since certain values of $x$ have more impact on $E [x, P]$ than
|
||||||
@ -320,37 +326,82 @@ frequently. As a consequence, the estimator variance will be reduced.
|
|||||||
### VEGAS
|
### VEGAS
|
||||||
|
|
||||||
|
|
||||||
The VEGAS algorithm is based on importance sampling. It aims to reduce the
|
The VEGAS algorithm of Lepage is based on importance sampling. It aims to
|
||||||
integration error by concentrating points in the regions that make the largest
|
reduce the integration error by concentrating points in the regions that make
|
||||||
contribution to the integral.
|
the largest contribution to the integral.
|
||||||
|
|
||||||
As stated before, in practice it is impossible to sample points from the best
|
As stated before, in practice it is impossible to sample points from the best
|
||||||
distribution $P^{(y^{\star})}$: only a good approximation can be achieved. In
|
distribution $P^{(y)}$: only a good approximation can be achieved. In GSL, the
|
||||||
GSL, the VEGAS algorithm approximates the distribution by histogramming the
|
VEGAS algorithm approximates the distribution by histogramming the function $f$
|
||||||
function $f$ in different subregions. Each histogram is used to define a
|
in different subregions with a reiterative method [@lepage78], namely:
|
||||||
sampling distribution for the next pass, which consists in doing the same thing
|
|
||||||
recorsively: this procedure converges asymptotically to the desired
|
|
||||||
distribution. It follows that a better estimation is achieved with a greater
|
|
||||||
number of function calls.
|
|
||||||
The integration uses a fixed number of function calls. The result and its
|
|
||||||
error estimate are based on a weighted average of independent samples, as for
|
|
||||||
MISER.
|
|
||||||
For this particular sample, results are shown in @tbl:VEGAS.
|
|
||||||
|
|
||||||
-------------------------------------------------------------------------
|
- a fixed number of points (function calls) is evenly generated in the whole
|
||||||
500'000 calls 5'000'000 calls 50'000'000 calls
|
region;
|
||||||
----------------- ----------------- ------------------ ------------------
|
- the volume $V$ is divided into $N$ intervals with width $\Delta x_i =
|
||||||
$I^{\text{oss}}$ 1.7182818354 1.7182818289 1.7182818285
|
\Delta x \, \forall \, i$, where $N$ is limited by the computer storage
|
||||||
|
space available and must be held constant from iteration to iteration.
|
||||||
|
Default $N = 50$;
|
||||||
|
- each interval is then divided into $m_i + 1$ subintervals, where:
|
||||||
|
$$
|
||||||
|
m_i = K \frac{\bar{f}_i \Delta x_i}{\sum_j \bar{f}_j \Delta x_j}
|
||||||
|
$$
|
||||||
|
|
||||||
$\sigma$ 0.0000000137 0.0000000004 0.0000000000
|
where $j$ runs over all the intervals and $\bar{f}_i$ is the average value
|
||||||
|
of $f$ in the interval. Hence, $m_i$ is therefore a measure of the
|
||||||
|
"importance" of the interval with respect to the others: the higher
|
||||||
|
$\bar{f}_i$, the higher $m_i$. The stiffness $K$ is default set to 1.5;
|
||||||
|
- as it is desirable to restore the number of intervals to its original value
|
||||||
|
$N$, groups of the new intervals must be amalgamated into larger intervals,
|
||||||
|
the number of subintervals in each group being constant. The net effect is
|
||||||
|
to alter the intervals sizes, while keeping the total number constant, so
|
||||||
|
that the smallest intervals occur where $f$ is largest;
|
||||||
|
- the new grid is used and further refined in subsequent iterations until the
|
||||||
|
optimal grid has been obtained. By default, the number of iterations is set
|
||||||
|
to 5.
|
||||||
|
|
||||||
diff 0.0000000069 0.0000000004 0.0000000000
|
At the end, a cumulative estimate of the integral $I$ and its error $\sigma_I$
|
||||||
-------------------------------------------------------------------------
|
are made based on weighted average:
|
||||||
|
$$
|
||||||
|
I = \sigma_I^2 \sum_i \frac{I_i}{\sigma_i^2}
|
||||||
|
\with
|
||||||
|
\sigma_I^2 = \left( \sum_i \frac{1}{\sigma_i^2} \right)^{-1}
|
||||||
|
$$
|
||||||
|
|
||||||
Table: VEGAS results with different numbers of
|
where $I_i$ and $\sigma_i$ are are the integral and standard deviation
|
||||||
|
estimated in each interval $\Delta x_i$ of the last iteration using the plain
|
||||||
|
MC technique.
|
||||||
|
For the results to be reliable, the chi-squared per degree of freedom
|
||||||
|
$\chi_r^2$ must be consistent with 1. While performing the reiterations, if
|
||||||
|
the $\chi_r^2$ value exceed 1.5, the cycle breaks.
|
||||||
|
|
||||||
|
Clearly, once again a better estimation is achieved with a greater number of
|
||||||
|
function calls. For this particular sample, the most accurate results are shown
|
||||||
|
in @fig:MI_VE and some of them are listed in @tbl:VEGAS.
|
||||||
|
|
||||||
|
----------------------------------------------------------------------------------------------
|
||||||
|
calls $I^{\text{oss}}$ $\sigma$ diff $\chi_r^2$
|
||||||
|
------------------ ------------------ ------------------ ------------------ ------------------
|
||||||
|
\SI{5e5}{} 1.7182818281 0.0000000012 0.0000000004 1.457
|
||||||
|
|
||||||
|
\SI{5e6}{} 1.7182818284 0.0000000000 0.0000000001 0.632
|
||||||
|
|
||||||
|
\SI{5e7}{} 1.7182818285 0.0000000000 0.0000000000 0.884
|
||||||
|
----------------------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
Table: Some VEGAS results with different numbers of
|
||||||
function calls. {#tbl:VEGAS}
|
function calls. {#tbl:VEGAS}
|
||||||
|
|
||||||
This time, the error estimation is notably close to diff for each number of
|
As can be appreciated in @fig:MI_VE, the VEGAS algorithm manages to compute
|
||||||
function calls, meaning that the estimation of both the integral and its
|
the integral value in a most accurate way with respect to MISER. The $\chi_r^2$
|
||||||
error turn out to be very accurate, much more than the ones obtained with
|
turns out to be close enough to 1 to guarantee a good estimation of $I$,
|
||||||
both plain Monte Carlo method and stratified sampling.
|
goodness which is also confirmed by the very small difference between estimation
|
||||||
|
and exact value, as shown in @tbl:VEGAS: with a number of \SI{5e7}{} of function
|
||||||
|
calls, the difference is smaller than \SI{1e-10}{}.
|
||||||
|
|
||||||
|
![Only the most accurate results are shown in order to stress the
|
||||||
|
differences between VEGAS (in gray) and MISER (in black) methods
|
||||||
|
results.](images/MC_MI_VE.pdf){#fig:MI_VE}
|
||||||
|
|
||||||
|
In conclusion, between a plain Monte Carlo technique, stratified sampling and
|
||||||
|
importance sampling, the last one turned out to be the most powerful mean to
|
||||||
|
obtain a good estimation of the integrand.
|
||||||
|
@ -1,5 +1,3 @@
|
|||||||
- cambiare simbolo convoluzione
|
- cambiare simbolo convoluzione
|
||||||
- aggiungere citazioni e referenze
|
- aggiungere citazioni e referenze
|
||||||
- rifare grafici senza bordino
|
|
||||||
- leggere l'articolo di Lucy
|
- leggere l'articolo di Lucy
|
||||||
- riscrivere il readme del 5
|
|
||||||
|
Loading…
Reference in New Issue
Block a user