ex-5: complete writing

This commit is contained in:
Giù Marcer 2020-03-15 22:48:56 +01:00 committed by rnhmjoj
parent 01a5f06cc5
commit 2e2a4eacf9

View File

@ -219,7 +219,8 @@ Table: MISER results with different numbers of function calls. Be careful:
is divided into subsections. {#tbl:MISER}
This time the error, altough it lies always in the same order of magnitude of
diff, seems to seesaw around the correct value.
diff, seems to seesaw around the correct value, which is much more closer to
the expected one.
## Importance sampling
@ -311,81 +312,45 @@ $$
P^{(y^{\star})}(x \in [a, a + da]) = \frac{1}{E [x, P]} a P (x \in [a, a + da])
$$
---
In conclusion, since certain values of $x$ have more impact on $E [x, P]$ than
others, these "important" values must be emphasized by sampling them more
frequently. As a consequence, the estimator variance will be reduced.
### VEGAS \textcolor{red}{WIP}
The VEGAS algorithm is based on importance sampling. It samples points from the
probability distribution described by the function $f$, so that the points are
concentrated in the regions that make the largest contribution to the integral.
In general, if the MC integral of $f$ is sampled with points distributed
according to a probability distribution $g$, the following estimate of the integral
is obtained:
$$
E (f|g \, , \, N) \with \sigma^2(f|g \, , \, N)
$$
If the probability distribution is chosen as $g = f$, it can be shown that the
variance vanishes, and the error in the estimate will therefore be zero.
In practice, it is impossible to sample points from the exact distribution: only
a good approximation can be achieved. In GSL, the VEGAS algorithm approximates
the distribution by histogramming the function $f$ in different subregions. Each
histogram is used to define a sampling distribution for the next pass, which
consists in doing the same thing recorsively: this procedure converges
asymptotically to the desired distribution.
In order to avoid the number of histogram bins growing like $K^d$, the
probability distribution is approximated by a separable function:
$$
f (x_1, x_2, \ldots) = f_1(x_1) f_2(x_2) \ldots
$$
so that the number of bins required is only $Kd$. This is equivalent to locating
the peaks of the function from the projections of the integrand onto the
coordinate axes. The efficiency of VEGAS depends on the validity of this
assumption. It is most efficient when the peaks of the integrand are
well-localized. If an integrand can be rewritten in a form which is
approximately separable this will increase the efficiency of integration with
VEGAS.
VEGAS incorporates a number of additional features, and combines both stratified
sampling and importance sampling. The integration region is divided into a number
of “boxes”, with each box getting a fixed number of points (the goal is 2). Each
box can then have a fractional number of bins, but if the ratio of bins-per-box is
less than two, Vegas switches to a kind variance reduction (rather than importance
sampling).
### VEGAS
The VEGAS algorithm is based on importance sampling. It aims to reduce the
integration error by concentrating points in the regions that make the largest
contribution to the integral.
As stated before, in practice it is impossible to sample points from the best
distribution $P^{(y^{\star})}$: only a good approximation can be achieved. In
GSL, the VEGAS algorithm approximates the distribution by histogramming the
function $f$ in different subregions. Each histogram is used to define a
sampling distribution for the next pass, which consists in doing the same thing
recorsively: this procedure converges asymptotically to the desired
distribution. It follows that a better estimation is achieved with a greater
number of function calls.
The integration uses a fixed number of function calls. The result and its
error estimate are based on a weighted average of independent samples, as for
MISER.
For this particular sample, results are shown in @tbl:VEGAS.
---
-------------------------------------------------------------------------
500'000 calls 5'000'000 calls 50'000'000 calls
----------------- ----------------- ------------------ ------------------
$I^{\text{oss}}$ 1.7182818354 1.7182818289 1.7182818285
---------------------------------------------------------
calls plain MC Miser Vegas
------------ -------------- -------------- --------------
500'000 1.7166435813 1.7182850738 1.7182818354
$\sigma$ 0.0000000137 0.0000000004 0.0000000000
5'000'000 1.7181231109 1.7182819143 1.7182818289
diff 0.0000000069 0.0000000004 0.0000000000
-------------------------------------------------------------------------
50'000'000 1.7183387184 1.7182818221 1.7182818285
---------------------------------------------------------
Table: Results of the three methods. {#tbl:results}
---------------------------------------------------------
calls plain MC Miser Vegas
------------ -------------- -------------- --------------
500'000 0.0006955691 0.0000021829 0.0000000137
5'000'000 0.0002200309 0.0000001024 0.0000000004
50'000'000 0.0000695809 0.0000000049 0.0000000000
---------------------------------------------------------
Table: $\sigma$s of the three methods. {#tbl:sigmas}
Table: VEGAS results with different numbers of
function calls. {#tbl:VEGAS}
This time, the error estimation is notably close to diff for each number of
function calls, meaning that the estimation of both the integral and its
error turn out to be very accurate, much more than the ones obtained with
both plain Monte Carlo method and stratified sampling.