analistica/notes/sections/5.md
2020-03-06 02:24:32 +01:00

4.0 KiB

Exercize 5

Numerically compute an integral value via Monte Carlo approaches

The integral to be evaluated is the following:


  I = \int\limits_0^1 dx \, e(x)

whose exact value is 1.7182818285...
The three most popular MC methods where applied: plain Monte Carlo, Miser and Vegas (besides this popularity fact, these three method were chosen for being the only ones implemented in the GSL library).

Plain Monte Carlo

When the integral I over a volume V of a function f must be evaluated in a $n-$dimensional space, the simplest MC method approach is to sample $N$ points x_i evenly distributed in V and approx I as:


  I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle

with I_N \rightarrow I for N \rightarrow + \infty for the law of large numbers. Hence, the variance can be merely extimated as:


  \sigma^2 =
  \frac{1}{N-1} \sum_{i = 1}^N \left( f(x_i) - \langle f \rangle \right)^2

thus, the error on I_N decreases as 1/\sqrt{N}. Unlike in deterministic methods, the estimate of the error is not a strict error bound: random sampling may not uncover all the important features of the integrand that can result in an underestimate of the error.
Since the proximity of I_N to I is related to N, the accuracy of the method is determined by the number of function calls when implemented (proof in @tbl:MC).


       500'000 calls     5'000'000 calls    50'000'000 calls

result 1.7166435813 1.7181231109 1.7183387184

\sigma 0.0006955691 0.0002200309 0.0000695809

Table: MC results and errors with different numbers of function calls. {#tbl:MC}

Miser

The MISER algorithm is based on recursive stratified sampling. On each recursion step the integral and the error are estimated using a plain Monte Carlo algorithm. If the error estimate is larger than the required accuracy, the integration volume is divided into sub-volumes and the procedure is recursively applied to sub-volumes.
This technique aims to reduce the overall integration error by concentrating integration points in the regions of highest variance.
The idea of stratified sampling begins with the observation that for two disjoint regions a and b with Monte Carlo estimates of the integral E_a (f) and E_b (f) and variances \sigma_a^2 (f) and \sigma_b^2 (f), the variance V (f) of the combined estimate E (f):


  E (f)= \frac {1}{2} \left( E_a (f) + E_b (f) \right)

is given by,


  V(f) = \frac{\sigma_a^2 (f)}{4 N_a} + \frac{\sigma_b^2 (f)}{4 N_b}

It can be shown that this variance is minimized by distributing the points such that,


  \frac{N_a}{N_a + N_b} = \frac{\sigma_a}{\sigma_a + \sigma_b}
                          \cdot \frac{N_a}{N_a + N_b}
                        = \frac{\sigma_a}{\sigma_a + \sigma b}

Hence the smallest error estimate is obtained by allocating sample points in proportion to the standard deviation of the function in each sub-region.



   calls     plain MC        Miser          Vegas    

500'000   1.7166435813   1.7182850738   1.7182818354 

5'000'000 1.7181231109 1.7182819143 1.7182818289

50'000'000 1.7183387184 1.7182818221 1.7182818285

Table: Results of the three methods. {#tbl:results}


   calls     plain MC        Miser          Vegas    

500'000   0.0006955691   0.0000021829   0.0000000137

5'000'000 0.0002200309 0.0000001024 0.0000000004

50'000'000 0.0000695809 0.0000000049 0.0000000000

Table: $\sigma$s of the three methods. {#tbl:sigmas}