diff --git a/README.md b/README.md index 070dcaa..3c33f27 100644 --- a/README.md +++ b/README.md @@ -42,15 +42,15 @@ See this [guide](https://nixos.org/nix/manual/#chap-quick-start) if you have never used Nix before. Running `nix-shell` in the top-level will drop you into the development shell. -Once ready, invoke `make` with the program you wish to build. For example +Once ready, invoke `make` with the program you wish to build. For example: $ make ex-1/bin/main -or, to build every program of an exercise +or, to build every program of an exercise: $ make ex-1 -To clean up the build results run +To clean up the build results run: $ make clean @@ -82,9 +82,9 @@ The output can be redirected to `ex-1/pdf-plot.py` to generate a plot. Every program in `ex-2` computes the best available approximation (with a given method) to the Euler-Mascheroni γ constant and prints[1]: -1. the leading decimal digits of the approximate value found +1. the leading decimal digits of the approximate value found; -2. the exact decimal digits of γ +2. the exact decimal digits of γ; 3. the absolute difference between the 1. and 2. @@ -104,7 +104,7 @@ cases the best fit and the parameter covariance matrix are printed. The program then performs a t-test to assert the compatibility of the data with two hypothesis and print the results in a table. -To plot a 2D histogram of the generated sample do +To plot a 2D histogram of the generated sample do: $ ex-3/bin/main -i | ex-3/plot.py @@ -123,7 +123,7 @@ horizontal component. It is possible to set the maximum momentum with the option `-p`. A χ² fit and a t-test compatibility are performed with respect to the expected distribution and results are printed. -To plot a histogram of the generated sample do +To plot a histogram of the generated sample do: $ ex-4/bin/main -o | ex-4/plot.py @@ -133,11 +133,30 @@ and `-b`. ### Exercise 5 -`ex-5/main` compute the integral of exp(1) between 0 and 1 with the methods -plain MC, MISER and VEGAS. Being reiterative routines, it takes the number of -iterations as its only argument. -It prints out the obatined value and its estimated error for each method. +`ex-5/main` compute estimations of the integral of exp(1) between 0 and 1 +with the methods plain MC, MISER and VEGAS with different number of points. +It takes no arguments in input. +It prints out eight columns: +1. the number of sorted points; + +2. the integral estimation with plain MC; + +3. its uncertainty; + +4. the integral estimation with MISER; + +5. its uncertainty; + +6. the integral estimation with VEGAS; + +7. its uncertainty; + +8. its χ². + +To plot the results do: + + $ ex-5/main | ex-5/plot.py ### Exercise 6 @@ -167,8 +186,8 @@ classified data in this order: signal then noise. To plot the result of the linear classification pipe the output to `ex-7/plot.py`. The program generates two figures: - - a scatter plot showing the Fisher projection line and the cut line - - two histograms of the projected data and the cut line + - a scatter plot showing the Fisher projection line and the cut line; + - two histograms of the projected data and the cut line. `ex-7/bin/test` takes a model trained in `ex-7/bin/main` and test it against newly generated datasets (`-i` to set the number of test iterations). The diff --git a/ex-5/main.c b/ex-5/main.c index bfb6014..39931b5 100644 --- a/ex-5/main.c +++ b/ex-5/main.c @@ -15,14 +15,18 @@ double function (double * x, size_t dim, void * params) // Function which prints out the results when called. // -void results (size_t calls, double result, double error) +void results (size_t calls, double result, double error, double chi) { if (calls != 0) { - printf ("%ld\t", calls); + printf ("%.0e\t", (double)calls); } printf ("%.10f\t", result); printf ("%.10f\t", error); + if (chi != 0) + { + printf ("%.3f\t", chi); + } } @@ -33,7 +37,7 @@ int main(int argc, char** argv) size_t dims = 1; // Integral dimension double lower[1] = {0}; // Integration lower limit double upper[1] = {1}; // Integration upper limit - size_t calls = 5000; // Initial number of function calls + size_t calls = 50; // Initial number of function calls double integral, error; // Initialize an RNG. @@ -65,26 +69,37 @@ int main(int argc, char** argv) gsl_monte_plain_integrate (&expo, lower, upper, dims, calls, r, sMC, &integral, &error); gsl_monte_plain_free(sMC); - results(calls, integral, error); + results(calls, integral, error, 0); // MISER gsl_monte_miser_state *sMI = gsl_monte_miser_alloc (dims); gsl_monte_miser_integrate (&expo, lower, upper, dims, calls, r, sMI, &integral, &error); gsl_monte_miser_free(sMI); - results(0, integral, error); + results(0, integral, error, 0); // VEGAS gsl_monte_vegas_state *sVE = gsl_monte_vegas_alloc (dims); gsl_monte_vegas_integrate (&expo, lower, upper, dims, calls, r, sVE, &integral, &error); + + // In order to see the parameters of the VEGAS integration, decomment this + // paragraph: + // + // gsl_monte_vegas_params params; + // gsl_monte_vegas_params_get(sVE, ¶ms); + // params.verbose = 1; + // params.ostream = stderr; + // gsl_monte_vegas_params_set(sVE, ¶ms); do { gsl_monte_vegas_integrate (&expo, lower, upper, dims, - calls/5, r, sVE, &integral, &error); + calls, r, sVE, &integral, &error); } while (fabs (gsl_monte_vegas_chisq (sVE) - 1.0) > 0.5); gsl_monte_vegas_free(sVE); - results(0, integral, error); + double chi = sVE->chisq; + results(0, integral, error, chi); + // Update function calls printf ("\n"); diff --git a/ex-5/plot.py b/ex-5/plot.py old mode 100644 new mode 100755 index 4c74c51..d297e70 --- a/ex-5/plot.py +++ b/ex-5/plot.py @@ -1,8 +1,10 @@ +#!/usr/bin/env python + import matplotlib.pyplot as plt import numpy as np import sys -calls, I_MC, σ_MC, I_MI, σ_MI, I_VE, σ_VE = np.loadtxt(sys.stdin, unpack=True) +calls, I_MC, σ_MC, I_MI, σ_MI, I_VE, σ_VE, chi = np.loadtxt(sys.stdin, unpack=True) exact = 1.7182818285 plt.rcParams['font.size'] = 17 diff --git a/notes/docs/bibliography.bib b/notes/docs/bibliography.bib index f73e77b..af6bf5e 100644 --- a/notes/docs/bibliography.bib +++ b/notes/docs/bibliography.bib @@ -106,3 +106,35 @@ year={2005}, journal={Matrix} } + +@article{sayah19, + title={Adaptive stratified Monte Carlo algorithm for numerical computation + of integrals}, + author={Sayah, Toni}, + journal={Mathematics and Computers in Simulation}, + volume={157}, + pages={143--158}, + year={2019}, + publisher={Elsevier} +} + + +@article{ridder17, + title={Variance reduction}, + author={Ridder, AAN and Botev, ZI}, + journal={Wiley StatsRef: Statistics Reference Online}, + pages={1--6}, + year={2017}, + publisher={Wiley} +} + +@article{lepage78, + title={A new algorithm for adaptive multidimensional integration}, + author={Lepage, G Peter}, + journal={Journal of Computational Physics}, + volume={27}, + number={2}, + pages={192--203}, + year={1978}, + publisher={Elsevier} +} diff --git a/notes/images/MC_some.pdf b/notes/images/MC_MC.pdf similarity index 60% rename from notes/images/MC_some.pdf rename to notes/images/MC_MC.pdf index f995a9a..2f0e061 100644 Binary files a/notes/images/MC_some.pdf and b/notes/images/MC_MC.pdf differ diff --git a/notes/images/MC_all.pdf b/notes/images/MC_MC_MI.pdf similarity index 69% rename from notes/images/MC_all.pdf rename to notes/images/MC_MC_MI.pdf index 6363da3..3c3b86f 100644 Binary files a/notes/images/MC_all.pdf and b/notes/images/MC_MC_MI.pdf differ diff --git a/notes/images/MC_few.pdf b/notes/images/MC_MI_VE.pdf similarity index 100% rename from notes/images/MC_few.pdf rename to notes/images/MC_MI_VE.pdf diff --git a/notes/sections/4.md b/notes/sections/4.md index f1972dc..6c5f081 100644 --- a/notes/sections/4.md +++ b/notes/sections/4.md @@ -209,7 +209,8 @@ $$ P^{\text{oss}}_{\text{max}} = 10.005 \pm 0.018 \with \chi_r^2 = 0.071 $$ -where $\chi_r^2$ is the reduced $\chi^2$, proving a good convergence. +where $\chi_r^2$ is the $\chi^2$ per degree of freedom, proving a good +convergence. In order to compare $P^{\text{oss}}_{\text{max}}$ with the expected value $P_{\text{max}} = 10$, the following compatibility t-test was applied: diff --git a/notes/sections/5.md b/notes/sections/5.md index 7bfc1f2..cba8c15 100644 --- a/notes/sections/5.md +++ b/notes/sections/5.md @@ -1,10 +1,7 @@ -# Exercize 5 +# Exercise 5 -The following integral must be evaluated: - -$$ - I = \int\limits_0^1 dx \, e^x -$$ +The following integral must be evaluated comparing different Monte Carlo +techniques. \begin{figure} \hypertarget{fig:exp}{% @@ -24,6 +21,8 @@ $$ % Plot \draw [domain=-2:7, smooth, variable=\x, cyclamen, ultra thick] plot ({\x},{exp(\x/5)}); + % Equation + \node [above] at (2.5, 2.5) {$I = \int\limits_0^1 dx \, e^x$}; \end{tikzpicture} \caption{Plot of the integral to be evaluated.} } @@ -33,14 +32,14 @@ whose exact value is 1.7182818285... The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser and Vegas. Besides this popularity fact, these three method were chosen for -being the only ones implemented in the GSL library. +being implemented in the GSL libraries `gsl_monte_plain`, `gsl_monte_miser` and +`gsl_monte_vegas` respectively. ## Plain Monte Carlo -When the integral $I$ over a $n-$dimensional space $\Omega$ of volume $V$ of a +When an integral $I$ over a $n-$dimensional space $\Omega$ of volume $V$ of a function $f$ must be evaluated, that is: - $$ I = \int\limits_{\Omega} dx \, f(x) \with V = \int\limits_{\Omega} dx @@ -48,14 +47,12 @@ $$ the simplest MC method approach is to sample $N$ points $x_i$ evenly distributed in $V$ and approx $I$ as: - $$ I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle $$ -with $I_N \rightarrow I$ for $N \rightarrow + \infty$ for the law of large -numbers. Hence, the sample variance can be extimated by the sample variance: - +with $I_N \rightarrow I$ for $N \rightarrow + \infty$ because of the law of +large numbers, whereas the sample variance can be estimated as: $$ \sigma^2_f = \frac{1}{N - 1} \sum_{i = 1}^N \left( f(x_i) - \langle f \rangle \right)^2 \et \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N @@ -67,31 +64,55 @@ Unlike in deterministic methods, the estimate of the error is not a strict error bound: random sampling may not uncover all the important features of the integrand and this can result in an underestimate of the error. -In this case, $f(x) = e^{x}$ and $\Omega = [0,1]$. +In this case $f(x) = e^{x}$ and $\Omega = [0,1]$, hence $V = 1$. Since the proximity of $I_N$ to $I$ is related to $N$, the accuracy of the -method is determined by how many points are generated, namely how many function -calls are exectuted when the method is implemented. In @tbl:MC, the obtained -results and errors $\sigma$ are shown. The estimated integrals for different -numbers of calls are compared to the expected value $I$ and the difference -'diff' between them is given. -As can be seen, the MC method tends to underestimate the error for scarse -function calls. As previously stated, the higher the number of function calls, -the better the estimation of $I$. A further observation regards the fact that, -even with $50'000'000$ calls, the $I^{\text{oss}}$ still differs from $I$ at -the fifth decimal digit. +method lies in how many points are generated, namely how many function calls +are executed when the reiterative method is implemented. In @fig:MC and +@fig:MI, results obtained with the plain MC method are shown in red. In +@tbl:MC, some of them are listed: the estimated integrals $I^{\text{oss}}$ are +compared to the expected value $I$ and the differences diff between them are +given. -------------------------------------------------------------------------- - 500'000 calls 5'000'000 calls 50'000'000 calls ------------------ ----------------- ------------------ ------------------ -$I^{\text{oss}}$ 1.7166435813 1.7181231109 1.7183387184 +![Estimated values of $I$ obatined by Plain MC technique with different +number of function calls; logarithmic scale; errorbars showing their +estimated uncertainties. As can be seen, the process does a sort o seesaw +around the correct value.](images/MC_MC.pdf){#fig:MC} -$\sigma$ 0.0006955691 0.0002200309 0.0000695809 +--------------------------------------------------------------------------- +calls $I^{\text{oss}}$ $\sigma$ diff +------------------ ------------------ ------------------ ------------------ +\SI{5e5}{} 1.7166435813 0.0006955691 0.0016382472 -diff 0.0016382472 0.0001587176 0.0000568899 -------------------------------------------------------------------------- +\SI{5e6}{} 1.7181231109 0.0002200309 0.0001587176 -Table: MC results with different numbers of function calls. {#tbl:MC} +\SI{5e7}{} 1.7183387184 0.0000695809 0.0000568899 +--------------------------------------------------------------------------- + +Table: Some MC results with three different numbers of function calls. + Differences between computed and exact values are given in + diff. {#tbl:MC} + +As can be seen, $\sigma$ is always of the same order of magnitude of diff, +except for very low numbers of function calls. Even with \SI{5e7}{} calls, +$I^{\text{oss}}$ still differs from $I$ at the fifth decimal digit, meaning that +this method shows a really slow convergence. In fact, since the $\sigma$s +dependence on the number $C$ of function calls is confirmed: +$$ + \begin{cases} + \sigma_1 = \SI{6.95569e-4}{} \longleftrightarrow C_1 = \SI{5e6}{} \\ + \sigma_2 = \SI{6.95809e-5}{} \longleftrightarrow C_1 = \SI{5e8}{} + \end{cases} +$$ + +$$ + \thus + \frac{\sigma_1}{\sigma_2} = 9.9965508 \sim 10 = \sqrt{\frac{C_2}{C_1}} +$$ + +if an error of $\sim 1^{-n}$ is required, a number $\propto 10^{2n}$ of +function calls must be executed, meaning that for $\sigma \sim 1^{-10} +\rightarrow C = \SI{1e20}{}$, which would be impractical. ## Stratified sampling @@ -101,12 +122,12 @@ partitioned into subpopulations. Stratification, indeed, is the process of dividing the primary sample into subgroups (strata) before sampling random within each stratum. Given the mean $\bar{x}_i$ and variance ${\sigma^2_x}_i$ of an entity $x$ -sorted with simple random sampling in each strata, such as: - +sorted with simple random sampling in the $i^{\text{th}}$ strata, such as: $$ \bar{x}_i = \frac{1}{n_i} \sum_j x_j $$ +and: $$ \sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( x_j - \bar{x}_i \right)^2 \thus @@ -115,13 +136,12 @@ $$ where: - - $j$ runs over the points $x_j$ sampled in the $i^{\text{th}}$ stratum - - $n_i$ is the number of points sorted in it - - $\sigma_i^2$ is the variance associated with the $j^{\text{th}}$ point + - $j$ runs over the points $x_j$ sampled in the $i^{\text{th}}$ stratum, + - $n_i$ is the number of points sorted in it, + - $\sigma_i^2$ is the variance associated with the $j^{\text{th}}$ point. -then the mean $\bar{x}$ and variance $\sigma_x^2$ estimated with stratified +then, the mean $\bar{x}$ and variance $\sigma_x^2$ estimated with stratified sampling for the whole population are: - $$ \bar{x} = \frac{1}{N} \sum_i N_i \bar{x}_i \et \sigma_x^2 = \sum_i \left( \frac{N_i}{N} \right)^2 {\sigma_x}^2_i @@ -138,7 +158,7 @@ result will have a smaller error in estimation with respect to the one otherwise obtained with simple sampling. For this reason, stratified sampling is used as a method of variance reduction when MC methods are used to estimate population statistics from a known -population. +population [@ridder17]. ### MISER @@ -147,36 +167,32 @@ The MISER technique aims to reduce the integration error through the use of recursive stratified sampling. As stated before, according to the law of large numbers, for a large number of extracted points, the estimation of the integral $I$ can be computed as: - $$ I= V \cdot \langle f \rangle $$ - Since $V$ is known (in this case, $V = 1$), it is sufficient to estimate $\langle f \rangle$. Consider two disjoint regions $a$ and $b$, such that $a \cup b = \Omega$, in -which $n_a$ and $n_b$ points were uniformely sampled. Given the Monte Carlo -estimates of the means $\langle f \rangle_a$ and $\langle f \rangle_b$ of those -points and their variances $\sigma_a^2$ and $\sigma_b^2$, if the weights $N_a$ -and $N_b$ of $\langle f \rangle_a$ and $\langle f \rangle_b$ are chosen unitary, -then the variance $\sigma^2$ of the combined estimate $\langle f \rangle$: - +which $n_a$ and $n_b$ points are respectively uniformly sampled. Given the +Monte Carlo estimates of the means $\langle f \rangle_a$ and $\langle f +\rangle_b$ of those points and their variances $\sigma_a^2$ and $\sigma_b^2$, if +the weights $N_a$ and $N_b$ of $\langle f \rangle_a$ and $\langle f \rangle_b$ +are chosen unitary, then the variance $\sigma^2$ of the combined estimate +$\langle f \rangle$: $$ \langle f \rangle = \frac{1}{2} \left( \langle f \rangle_a + \langle f \rangle_b \right) $$ is given by: - $$ \sigma^2 = \frac{\sigma_a^2}{4n_a} + \frac{\sigma_b^2}{4n_b} $$ It can be shown that this variance is minimized by distributing the points such that: - $$ \frac{n_a}{n_a + n_b} = \frac{\sigma_a}{\sigma_a + \sigma_b} $$ @@ -184,43 +200,48 @@ $$ Hence, the smallest error estimate is obtained by allocating sample points in proportion to the standard deviation of the function in each sub-region. The whole integral estimate and its variance are therefore given by: - $$ I = V \cdot \langle f \rangle \et \sigma_I^2 = V^2 \cdot \sigma^2 $$ -When implemented, MISER is in fact a recursive method. With a given step, all -the possible bisections are tested and the one which minimizes the combined -variance of the two sub-regions is selected. The variance in the sub-regions is -estimated with a fraction of the total number of available points. The remaining -sample points are allocated to the sub-regions using the formula for $n_a$ and -$n_b$, once the variances are computed. +When implemented, MISER is in fact a recursive method. First, all the possible +bisections of $\Omega$ are tested and the one which minimizes the combined +variance of the two sub-regions is selected. In order to speed up the +algorithm, the variance in the sub-regions is estimated with a fraction of the +total number of available points (function calls) which is default set to 0.1. +The remaining points are allocated to the sub-regions using the formula for +$n_a$ and $n_b$, once the variances are computed. The same procedure is then repeated recursively for each of the two half-spaces -from the best bisection. At each recursion step, the integral and the error are -estimated using a plain Monte Carlo algorithm. After a given number of calls, -the final individual values and their error estimates are then combined upwards -to give an overall result and an estimate of its error. +from the best bisection. When less than 32*16 (default option) function calls +are available in a single section, the integral and the error in this section +are estimated using the plain Monte Carlo algorithm. +The final individual values and their error estimates are then combined upwards +to give an overall result and an estimate of its error [@sayah19]. -Results for this particular sample are shown in @tbl:MISER. +![Estimations $I^{\text{oss}}$ of the integral $I$ obtained for the three + implemented method for different values of function calls. Errorbars + showing their estimated uncertainties.](images/MC_MC_MI.pdf){#fig:MI} -------------------------------------------------------------------------- - 500'000 calls 5'000'000 calls 50'000'000 calls ------------------ ----------------- ------------------ ------------------ -$I^{\text{oss}}$ 1.7182850738 1.7182819143 1.7182818221 +Results for this particular sample are shown in black in @fig:MI and some of +them are listed in @tbl:MI. Except for the first very little number of calls, +the improvement with respect to the Plain MC technique (in red) is appreciable. -$\sigma$ 0.0000021829 0.0000001024 0.0000000049 +--------------------------------------------------------------------------- +calls $I^{\text{oss}}$ $\sigma$ diff +------------------ ------------------ ------------------ ------------------ +\SI{5e5}{} 1.7182850738 0.0000021829 0.0000032453 -diff 0.0000032453 0.0000000858 000000000064 -------------------------------------------------------------------------- +\SI{5e6}{} 1.7182819143 0.0000001024 0.0000000858 -Table: MISER results with different numbers of function calls. Be careful: - while in @tbl:MC the number of function calls stands for the number of - total sampled poins, in this case it stands for the times each section - is divided into subsections. {#tbl:MISER} +\SI{5e7}{} 1.7182818221 0.0000000049 0.0000000064 +--------------------------------------------------------------------------- -This time the error, altough it lies always in the same order of magnitude of -diff, seems to seesaw around the correct value, which is much more closer to -the expected one. +Table: MISER results with different numbers of function calls. Differences + between computed and exact values are given in diff. {#tbl:MI} + +This time the convergence is much faster: already with 500'000 number of points, +the correct results differs from the computed one at the fifth decimal digit. +Once again, the error lies always in the same order of magnitude of diff. ## Importance sampling @@ -230,16 +251,14 @@ probability distribution $f$ itself, so that the points cluster in the regions that make the largest contribution to the integral. Remind that $I = V \cdot \langle f \rangle$ and therefore only $\langle f -\rangle$ must be estimated. Then, consider a sample of $n$ points {$x_i$} -generated according to a probability distribition function $P$ which gives -thereby the following expected value: - +\rangle$ must be estimated. Consider a sample of $n$ points {$x_i$} generated +according to a probability distribution function $P$ which gives thereby the +following expected value: $$ E [x, P] = \frac{1}{n} \sum_i x_i $$ with variance: - $$ \sigma^2 [E, P] = \frac{\sigma^2 [x, P]}{n} \with \sigma^2 [x, P] = \frac{1}{n -1} \sum_i \left( x_i - E [x, P] \right)^2 @@ -252,13 +271,11 @@ are evenly distributed in $\Omega$. The idea is to sample points from a different distribution to lower the variance of $E[x, P]$, which results in lowering $\sigma^2 [x, P]$. This is accomplished by choosing a random variable $y$ and defining a new probability $P^{(y)}$ in order to satisfy: - $$ E [x, P] = E \left[ \frac{x}{y}, P^{(y)} \right] $$ which is to say: - $$ I = \int \limits_{\Omega} dx f(x) = \int \limits_{\Omega} dx \, \frac{f(x)}{g(x)} \, g(x)= @@ -266,7 +283,6 @@ $$ $$ where $E \, \longleftrightarrow \, I$ and: - $$ \begin{cases} f(x) \, \longleftrightarrow \, x \\ @@ -274,42 +290,32 @@ $$ \end{cases} \et \begin{cases} - w(x) \, \longleftrightarrow \, \frac{x}{y} \\ - g(x) \, \longleftrightarrow \, y = P^{(y)} + g(x) \, \longleftrightarrow \, y = P^{(y)} \\ + w(x) = \frac{f(x)}{g(x)} \, \longleftrightarrow \, \frac{x}{y} \end{cases} $$ Where the symbol $\longleftrightarrow$ points out the connection between the -variables. This new estimate is better than the former if: - +variables. The best variable $y$ would be: $$ - \sigma^2 \left[ \frac{x}{y}, P^{(y)} \right] < \sigma^2 [x, P] + y = \frac{x}{E [x, P]} \, \longleftrightarrow \, \frac{f(x)}{I} + \thus \frac{x}{y} = E [x, P] $$ -The best variable $y$ would be: - -$$ - y^{\star} = \frac{x}{E [x, P]} \, \longleftrightarrow \, \frac{f(x)}{I} - \thus \frac{x}{y^{\star}} = E [x, P] -$$ - -and even a single sample under $P^{(y^{\star})}$ would be sufficient to give its -value. Obviously, it is not possible to take exactly this choice, since $E [x, -P]$ is not given a priori. -However, this gives an insight into what importance sampling does. In fact, -given that: - +and even a single sample under $P^{(y)}$ would be sufficient to give its value: +it can be shown that under this probability distribution the variance vanishes. +Obviously, it is not possible to take exactly this choice, since $E [x, P]$ is +not given a priori. However, this gives an insight into what importance sampling +does. In fact, given that: $$ E [x, P] = \int \limits_{a = - \infty}^{a = + \infty} - a P(x \in [a, a + da]) + da \, a P(x \in [a, a + da]) $$ -the best probability change $P^{(y^{\star})}$ redistributes the law of $x$ so -that its samples frequencies are sorted directly according to their weights in -$E[x, P]$, namely: - +the best probability $P^{(y)}$ redistributes the law of $x$ so that its samples +frequencies are sorted directly according to their weights in $E[x, P]$, namely: $$ - P^{(y^{\star})}(x \in [a, a + da]) = \frac{1}{E [x, P]} a P (x \in [a, a + da]) + P^{(y}(x \in [a, a + da]) = \frac{1}{E [x, P]} a P (x \in [a, a + da]) $$ In conclusion, since certain values of $x$ have more impact on $E [x, P]$ than @@ -320,37 +326,82 @@ frequently. As a consequence, the estimator variance will be reduced. ### VEGAS -The VEGAS algorithm is based on importance sampling. It aims to reduce the -integration error by concentrating points in the regions that make the largest -contribution to the integral. +The VEGAS algorithm of Lepage is based on importance sampling. It aims to +reduce the integration error by concentrating points in the regions that make +the largest contribution to the integral. As stated before, in practice it is impossible to sample points from the best -distribution $P^{(y^{\star})}$: only a good approximation can be achieved. In -GSL, the VEGAS algorithm approximates the distribution by histogramming the -function $f$ in different subregions. Each histogram is used to define a -sampling distribution for the next pass, which consists in doing the same thing -recorsively: this procedure converges asymptotically to the desired -distribution. It follows that a better estimation is achieved with a greater -number of function calls. -The integration uses a fixed number of function calls. The result and its -error estimate are based on a weighted average of independent samples, as for -MISER. -For this particular sample, results are shown in @tbl:VEGAS. +distribution $P^{(y)}$: only a good approximation can be achieved. In GSL, the +VEGAS algorithm approximates the distribution by histogramming the function $f$ +in different subregions with a reiterative method [@lepage78], namely: -------------------------------------------------------------------------- - 500'000 calls 5'000'000 calls 50'000'000 calls ------------------ ----------------- ------------------ ------------------ -$I^{\text{oss}}$ 1.7182818354 1.7182818289 1.7182818285 + - a fixed number of points (function calls) is evenly generated in the whole + region; + - the volume $V$ is divided into $N$ intervals with width $\Delta x_i = + \Delta x \, \forall \, i$, where $N$ is limited by the computer storage + space available and must be held constant from iteration to iteration. + Default $N = 50$; + - each interval is then divided into $m_i + 1$ subintervals, where: + $$ + m_i = K \frac{\bar{f}_i \Delta x_i}{\sum_j \bar{f}_j \Delta x_j} + $$ -$\sigma$ 0.0000000137 0.0000000004 0.0000000000 + where $j$ runs over all the intervals and $\bar{f}_i$ is the average value + of $f$ in the interval. Hence, $m_i$ is therefore a measure of the + "importance" of the interval with respect to the others: the higher + $\bar{f}_i$, the higher $m_i$. The stiffness $K$ is default set to 1.5; + - as it is desirable to restore the number of intervals to its original value + $N$, groups of the new intervals must be amalgamated into larger intervals, + the number of subintervals in each group being constant. The net effect is + to alter the intervals sizes, while keeping the total number constant, so + that the smallest intervals occur where $f$ is largest; + - the new grid is used and further refined in subsequent iterations until the + optimal grid has been obtained. By default, the number of iterations is set + to 5. -diff 0.0000000069 0.0000000004 0.0000000000 -------------------------------------------------------------------------- +At the end, a cumulative estimate of the integral $I$ and its error $\sigma_I$ +are made based on weighted average: +$$ + I = \sigma_I^2 \sum_i \frac{I_i}{\sigma_i^2} + \with + \sigma_I^2 = \left( \sum_i \frac{1}{\sigma_i^2} \right)^{-1} +$$ -Table: VEGAS results with different numbers of +where $I_i$ and $\sigma_i$ are are the integral and standard deviation +estimated in each interval $\Delta x_i$ of the last iteration using the plain +MC technique. +For the results to be reliable, the chi-squared per degree of freedom +$\chi_r^2$ must be consistent with 1. While performing the reiterations, if +the $\chi_r^2$ value exceed 1.5, the cycle breaks. + +Clearly, once again a better estimation is achieved with a greater number of +function calls. For this particular sample, the most accurate results are shown +in @fig:MI_VE and some of them are listed in @tbl:VEGAS. + +---------------------------------------------------------------------------------------------- +calls $I^{\text{oss}}$ $\sigma$ diff $\chi_r^2$ +------------------ ------------------ ------------------ ------------------ ------------------ +\SI{5e5}{} 1.7182818281 0.0000000012 0.0000000004 1.457 + +\SI{5e6}{} 1.7182818284 0.0000000000 0.0000000001 0.632 + +\SI{5e7}{} 1.7182818285 0.0000000000 0.0000000000 0.884 +---------------------------------------------------------------------------------------------- + +Table: Some VEGAS results with different numbers of function calls. {#tbl:VEGAS} -This time, the error estimation is notably close to diff for each number of -function calls, meaning that the estimation of both the integral and its -error turn out to be very accurate, much more than the ones obtained with -both plain Monte Carlo method and stratified sampling. +As can be appreciated in @fig:MI_VE, the VEGAS algorithm manages to compute +the integral value in a most accurate way with respect to MISER. The $\chi_r^2$ +turns out to be close enough to 1 to guarantee a good estimation of $I$, +goodness which is also confirmed by the very small difference between estimation +and exact value, as shown in @tbl:VEGAS: with a number of \SI{5e7}{} of function +calls, the difference is smaller than \SI{1e-10}{}. + +![Only the most accurate results are shown in order to stress the + differences between VEGAS (in gray) and MISER (in black) methods + results.](images/MC_MI_VE.pdf){#fig:MI_VE} + +In conclusion, between a plain Monte Carlo technique, stratified sampling and +importance sampling, the last one turned out to be the most powerful mean to +obtain a good estimation of the integrand. diff --git a/notes/todo b/notes/todo index 2742125..95a722e 100644 --- a/notes/todo +++ b/notes/todo @@ -1,5 +1,3 @@ - cambiare simbolo convoluzione - aggiungere citazioni e referenze -- rifare grafici senza bordino - leggere l'articolo di Lucy -- riscrivere il readme del 5