diff --git a/notes/sections/5.md b/notes/sections/5.md index dcbe8c4..7bfc1f2 100644 --- a/notes/sections/5.md +++ b/notes/sections/5.md @@ -219,7 +219,8 @@ Table: MISER results with different numbers of function calls. Be careful: is divided into subsections. {#tbl:MISER} This time the error, altough it lies always in the same order of magnitude of -diff, seems to seesaw around the correct value. +diff, seems to seesaw around the correct value, which is much more closer to +the expected one. ## Importance sampling @@ -311,81 +312,45 @@ $$ P^{(y^{\star})}(x \in [a, a + da]) = \frac{1}{E [x, P]} a P (x \in [a, a + da]) $$ - ---- +In conclusion, since certain values of $x$ have more impact on $E [x, P]$ than +others, these "important" values must be emphasized by sampling them more +frequently. As a consequence, the estimator variance will be reduced. -### VEGAS \textcolor{red}{WIP} - -The VEGAS algorithm is based on importance sampling. It samples points from the -probability distribution described by the function $f$, so that the points are -concentrated in the regions that make the largest contribution to the integral. - -In general, if the MC integral of $f$ is sampled with points distributed -according to a probability distribution $g$, the following estimate of the integral -is obtained: - -$$ - E (f|g \, , \, N) \with \sigma^2(f|g \, , \, N) -$$ - -If the probability distribution is chosen as $g = f$, it can be shown that the -variance vanishes, and the error in the estimate will therefore be zero. -In practice, it is impossible to sample points from the exact distribution: only -a good approximation can be achieved. In GSL, the VEGAS algorithm approximates -the distribution by histogramming the function $f$ in different subregions. Each -histogram is used to define a sampling distribution for the next pass, which -consists in doing the same thing recorsively: this procedure converges -asymptotically to the desired distribution. - -In order to avoid the number of histogram bins growing like $K^d$, the -probability distribution is approximated by a separable function: - -$$ - f (x_1, x_2, \ldots) = f_1(x_1) f_2(x_2) \ldots -$$ - -so that the number of bins required is only $Kd$. This is equivalent to locating -the peaks of the function from the projections of the integrand onto the -coordinate axes. The efficiency of VEGAS depends on the validity of this -assumption. It is most efficient when the peaks of the integrand are -well-localized. If an integrand can be rewritten in a form which is -approximately separable this will increase the efficiency of integration with -VEGAS. - -VEGAS incorporates a number of additional features, and combines both stratified -sampling and importance sampling. The integration region is divided into a number -of “boxes”, with each box getting a fixed number of points (the goal is 2). Each -box can then have a fractional number of bins, but if the ratio of bins-per-box is -less than two, Vegas switches to a kind variance reduction (rather than importance -sampling). +### VEGAS +The VEGAS algorithm is based on importance sampling. It aims to reduce the +integration error by concentrating points in the regions that make the largest +contribution to the integral. +As stated before, in practice it is impossible to sample points from the best +distribution $P^{(y^{\star})}$: only a good approximation can be achieved. In +GSL, the VEGAS algorithm approximates the distribution by histogramming the +function $f$ in different subregions. Each histogram is used to define a +sampling distribution for the next pass, which consists in doing the same thing +recorsively: this procedure converges asymptotically to the desired +distribution. It follows that a better estimation is achieved with a greater +number of function calls. +The integration uses a fixed number of function calls. The result and its +error estimate are based on a weighted average of independent samples, as for +MISER. +For this particular sample, results are shown in @tbl:VEGAS. ---- +------------------------------------------------------------------------- + 500'000 calls 5'000'000 calls 50'000'000 calls +----------------- ----------------- ------------------ ------------------ +$I^{\text{oss}}$ 1.7182818354 1.7182818289 1.7182818285 ---------------------------------------------------------- - calls plain MC Miser Vegas ------------- -------------- -------------- -------------- - 500'000 1.7166435813 1.7182850738 1.7182818354 +$\sigma$ 0.0000000137 0.0000000004 0.0000000000 - 5'000'000 1.7181231109 1.7182819143 1.7182818289 +diff 0.0000000069 0.0000000004 0.0000000000 +------------------------------------------------------------------------- - 50'000'000 1.7183387184 1.7182818221 1.7182818285 ---------------------------------------------------------- - -Table: Results of the three methods. {#tbl:results} - ---------------------------------------------------------- - calls plain MC Miser Vegas ------------- -------------- -------------- -------------- - 500'000 0.0006955691 0.0000021829 0.0000000137 - - 5'000'000 0.0002200309 0.0000001024 0.0000000004 - - 50'000'000 0.0000695809 0.0000000049 0.0000000000 ---------------------------------------------------------- - -Table: $\sigma$s of the three methods. {#tbl:sigmas} +Table: VEGAS results with different numbers of + function calls. {#tbl:VEGAS} +This time, the error estimation is notably close to diff for each number of +function calls, meaning that the estimation of both the integral and its +error turn out to be very accurate, much more than the ones obtained with +both plain Monte Carlo method and stratified sampling.