diff --git a/notes/sections/5.md b/notes/sections/5.md index bfa3971..3396008 100644 --- a/notes/sections/5.md +++ b/notes/sections/5.md @@ -5,85 +5,255 @@ The integral to be evaluated is the following: $$ - I = \int\limits_0^1 dx \, e(x) + I = \int\limits_0^1 dx \, e^x $$ -whose exact value is 1.7182818285... -The three most popular MC methods where applied: plain Monte Carlo, Miser and -Vegas (besides this popularity fact, these three method were chosen for being -the only ones implemented in the GSL library). +\begin{figure} +\hypertarget{fig:exp}{% +\centering +\begin{tikzpicture} + \definecolor{cyclamen}{RGB}{146, 24, 43} + % Integral + \filldraw [cyclamen!15!white, domain=0:5, variable=\x] + (0,0) -- plot({\x},{exp(\x/5)}) -- (5,0) -- cycle; + \draw [cyclamen] (5,0) -- (5,2.7182818); + \node [below] at (5,0) {1}; + % Axis + \draw [thick, <-] (0,4) -- (0,0); + \draw [thick, ->] (-2,0) -- (7,0); + \node [below right] at (7,0) {$x$}; + \node [above left] at (0,4) {$e^{x}$}; + % Plot + \draw [domain=-2:7, smooth, variable=\x, + cyclamen, ultra thick] plot ({\x},{exp(\x/5)}); +\end{tikzpicture} +\caption{Plot of the integral to be evaluated.} +} +\end{figure} + +whose exact value is 1.7182818285... + +The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser +and Vegas. Besides this popularity fact, these three method were chosen for +being the only ones implemented in the GSL library. + ## Plain Monte Carlo -When the integral $I$ over a volume $V$ of a function $f$ must be evaluated in -a $n-$dimensional space, the simplest MC method approach is to sample $N$ -points $x_i$ evenly distributed in $V$ and approx $I$ as: +When the integral $I$ over a $n-$dimensional space $\Omega$ of volume $V$ of a +function $f$ must be evaluated, that is: + +$$ + I = \int\limits_{\Omega} dx \, f(x) + \with V = \int\limits_{\Omega} dx +$$ + +the simplest MC method approach is to sample $N$ points $x_i$ evenly distributed +in $V$ and approx $I$ as: $$ I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle $$ with $I_N \rightarrow I$ for $N \rightarrow + \infty$ for the law of large -numbers. Hence, the variance can be merely extimated as: +numbers. Hence, the sample variance can be extimated by the sample variance: $$ - \sigma^2 = - \frac{1}{N-1} \sum_{i = 1}^N \left( f(x_i) - \langle f \rangle \right)^2 + \sigma^2_f = \frac{1}{N - 1} \sum_{i = 1}^N \left( f(x_i) - \langle f + \rangle \right)^2 \et \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N + \sigma^2_f = \frac{V^2}{N} \sigma^2_f $$ -thus, the error on $I_N$ decreases as $1/\sqrt{N}$. Unlike in deterministic -methods, the estimate of the error is not a strict error bound: random sampling -may not uncover all the important features of the integrand that can result in -an underestimate of the error. +Thus, the error decreases as $1/\sqrt{N}$. +Unlike in deterministic methods, the estimate of the error is not a strict error +bound: random sampling may not uncover all the important features of the +integrand and this can result in an underestimate of the error. + +In this case, $f(x) = e^{x}$ and $\Omega = [0,1]$. + Since the proximity of $I_N$ to $I$ is related to $N$, the accuracy of the -method is determined by the number of function calls when implemented (proof -in @tbl:MC). +method is determined by how many points are generated, namely how many function +calls are exectuted when the method is implemented. In @tbl:MC, the obtained +results and errors $\sigma$ are shown. The estimated integrals for different +numbers of calls are compared to the expected value $I$ and the difference +'diff' between them is given. +As can be seen, the MC method tends to underestimate the error for scarse +function calls. As previously stated, the higher the number of function calls, +the better the estimation of $I$. A further observation regards the fact that, +even with $50'000'000$ calls, the $I^{\text{oss}}$ still differs from $I$ at +the fifth decimal digit. ------------------------------------------------------------------ - 500'000 calls 5'000'000 calls 50'000'000 calls ---------- ----------------- ------------------ ------------------ -result 1.7166435813 1.7181231109 1.7183387184 +------------------------------------------------------------------------- + 500'000 calls 5'000'000 calls 50'000'000 calls +----------------- ----------------- ------------------ ------------------ +$I^{\text{oss}}$ 1.7166435813 1.7181231109 1.7183387184 -$\sigma$ 0.0006955691 0.0002200309 0.0000695809 ------------------------------------------------------------------ +$\sigma$ 0.0006955691 0.0002200309 0.0000695809 -Table: MC results and errors with different numbers of function -calls. {#tbl:MC} +diff 0.0016382472 0.0001587176 0.0000568899 +------------------------------------------------------------------------- -## Miser +Table: MC results with different numbers of function calls. {#tbl:MC} -The MISER algorithm is based on recursive stratified sampling. -On each recursion step the integral and the error are estimated using a plain -Monte Carlo algorithm. If the error estimate is larger than the required -accuracy, the integration volume is divided into sub-volumes and the procedure -is recursively applied to sub-volumes. -This technique aims to reduce the overall integration error by concentrating -integration points in the regions of highest variance. -The idea of stratified sampling begins with the observation that for two -disjoint regions $a$ and $b$ with Monte Carlo estimates of the integral -$E_a (f)$ and $E_b (f)$ and variances $\sigma_a^2 (f)$ and $\sigma_b^2 (f)$, -the variance $V (f)$ of the combined estimate $E (f)$: + +## Stratified sampling + +In statistics, stratified sampling is a method of sampling from a population +partitioned into subpopulations. Stratification, indeed, is the process of +dividing the primary sample into subgroups (strata) before sampling random +within each stratum. +Given the mean $\bar{x}_i$ and variance ${\sigma^2_x}_i$ of an entity $x$ +sorted with simple random sampling in each strata, such as: $$ - E (f)= \frac {1}{2} \left( E_a (f) + E_b (f) \right) + \bar{x}_i = \frac{1}{n_i} \sum_j x_j $$ -is given by, - $$ - V(f) = \frac{\sigma_a^2 (f)}{4 N_a} + \frac{\sigma_b^2 (f)}{4 N_b} + \sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( \frac{x_j - \bar{x}_i}{n_i} + \right)^2 + \thus + {\sigma^2_x}_i = \frac{1}{n_i^2} \sum_j \sigma_i^2 = \frac{\sigma_i^2}{n_i} $$ -It can be shown that this variance is minimized by distributing the points such that, +where: + + - $j$ runs over the points $x_j$ sampled in the $i^{\text{th}}$ stratum + - $n_i$ is the number of points sorted in it + - $\sigma_i^2$ is the variance associated with the $j^{\text{th}}$ point + +then the mean $\bar{x}$ and variance $\sigma_x^2$ estimated with stratified +sampling for the whole population are: + +$$ + \bar{x} = \frac{1}{N} \sum_i N_i \bar{x}_i \et + \sigma_x^2 = \sum_i \left( \frac{N_i}{N} \right)^2 {\sigma_x}^2_i + = \sum_i \left( \frac{N_i}{N} \right)^2 \frac{\sigma^2_i}{n_i} +$$ + +where $i$ runs over the strata, $N_i$ is the weight of the $i^{\text{th}}$ +stratum and $N$ is the sum of all strata weights. + +In practical terms, it can produce a weighted mean that has less variability +than the arithmetic mean of a simple random sample of the whole population. In +fact, if measurements within strata have lower standard deviation, the final +result will have a smaller error in estimation with respect to the one otherwise +obtained with simple sampling. +For this reason, stratified sampling is used as a method of variance reduction +when MC methods are used to estimate population statistics from a known +population. + +**MISER** + +The MISER technique aims to reduce the integration error through the use of +recursive stratified sampling. +Consider two disjoint regions $a$ and $b$ with volumes $V_a$ and $V_b$ and Monte +Carlo estimates $I_a = V_a \cdot \langle f \rangle_a$ and $I_b = V_b \cdot +\langle f \rangle_b$ of the integrals, where $\langle f \rangle_a$ and $\langle +f \rangle_b$ are the means of $f$ of the points sorted in those regions, and +variances $\sigma_a^2$ and $\sigma_b^2$ of those points. If the weights $N_a$ +and $N_b$ of $I_a$ and $I_b$ are unitary, then the variance $\sigma_I^2$ of the +combined estimate $I$: + +\textcolor{red}{QUI} + +$$ + I = \frac{1}{2} (I_a + I_b) +$$ + +is given by: + +$$ + \sigma_I^2 = \frac{\sigma_a^2}{N_a} + \frac{\sigma_b^2}{N_b} +$$ + +It can be shown that this variance is minimized by distributing the points such +that: $$ \frac{N_a}{N_a + N_b} = \frac{\sigma_a}{\sigma_a + \sigma_b} - \cdot \frac{N_a}{N_a + N_b} - = \frac{\sigma_a}{\sigma_a + \sigma b} $$ -Hence the smallest error estimate is obtained by allocating sample points in proportion -to the standard deviation of the function in each sub-region. +Hence, the smallest error estimate is obtained by allocating sample points in +proportion to the standard deviation of the function in each sub-region. + +such that $a \cup b = \Omega$ + +When implemented, MISER is in fact a recursive method. With a given step, all +the possible bisections are tested and the one which minimizes the combined +variance of the two sub-regions is selected. The same procedure is then repeated +recursively for each of the two half-spaces from the best bisection. At each +recursion step, the integral and the error are estimated using a plain Monte +Carlo algorithm. +After a given number of calls, the final individual values and their error +estimates are then combined upwards to give an overall result and an estimate of +its error. + +Results for this particular sample are shown in @tbl:MISER. + +------------------------------------------------------------------------- + 500'000 calls 5'000'000 calls 50'000'000 calls +----------------- ----------------- ------------------ ------------------ +$I^{\text{oss}}$ 1.7182850738 1.7182819143 1.7182818221 + +$\sigma$ 0.0000021829 0.0000001024 0.0000000049 + +diff 0.0000032453 0.0000000858 000000000064 +------------------------------------------------------------------------- + +Table: MISER results with different numbers of function calls. {#tbl:MISER} + +The error, altough it lies always in the same order of magnitude of diff, seems +to seesaw around the correct value as $N$ varies. + + +## VEGAS \textcolor{red}{WIP} + +The VEGAS algorithm is based on importance sampling. It samples points from the +probability distribution described by the function $f$, so that the points are +concentrated in the regions that make the largest contribution to the integral. + +In general, if the MC integral of $f$ is sampled with points distributed +according to a probability distribution $g$, the following estimate of the integral +is obtained: + +$$ + E (f|g \, , \, N) \with \sigma^2(f|g \, , \, N) +$$ + +If the probability distribution is chosen as $g = f$, it can be shown that the +variance vanishes, and the error in the estimate will therefore be zero. +In practice, it is impossible to sample points from the exact distribution: only +a good approximation can be achieved. In GSL, the VEGAS algorithm approximates +the distribution by histogramming the function $f$ in different subregions. Each +histogram is used to define a sampling distribution for the next pass, which +consists in doing the same thing recorsively: this procedure converges +asymptotically to the desired distribution. + +In order to avoid the number of histogram bins growing like $K^d$, the +probability distribution is approximated by a separable function: + +$$ + f (x_1, x_2, \ldots) = f_1(x_1) f_2(x_2) \ldots +$$ + +so that the number of bins required is only $Kd$. This is equivalent to locating +the peaks of the function from the projections of the integrand onto the +coordinate axes. The efficiency of VEGAS depends on the validity of this +assumption. It is most efficient when the peaks of the integrand are +well-localized. If an integrand can be rewritten in a form which is +approximately separable this will increase the efficiency of integration with +VEGAS. + +VEGAS incorporates a number of additional features, and combines both stratified +sampling and importance sampling. The integration region is divided into a number +of “boxes”, with each box getting a fixed number of points (the goal is 2). Each +box can then have a fractional number of bins, but if the ratio of bins-per-box is +less than two, Vegas switches to a kind variance reduction (rather than importance +sampling). + + + ---