analistica/notes/sections/5.md

# Exercize 5

**Numerically compute an integral value via Monte Carlo approaches**

The integral to be evaluated is the following:

$$
  I = \int\limits_0^1 dx \, e(x)
$$

whose exact value is 1.7182818285...  
The three most popular MC methods where applied: plain Monte Carlo, Miser and
Vegas (besides this popularity fact, these three method were chosen for being
the only ones implemented in the GSL library).

## Plain Monte Carlo

When the integral $I$ over a volume $V$ of a function $f$ must be evaluated in
a $n-$dimensional space, the simplest MC method approach is to sample $N$
points $x_i$ evenly distributed in $V$ and approx $I$ as:

$$
  I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle
$$

with $I_N \rightarrow I$ for $N \rightarrow + \infty$ for the law of large
numbers. Hence, the variance can be merely extimated as:

$$
  \sigma^2 =
  \frac{1}{N-1} \sum_{i = 1}^N \left( f(x_i) - \langle f \rangle \right)^2
$$

thus, the error on $I_N$ decreases as $1/\sqrt{N}$. Unlike in deterministic
methods, the estimate of the error is not a strict error bound: random sampling
may not uncover all the important features of the integrand that can result in
an underestimate of the error.  
Since the proximity of $I_N$ to $I$ is related to $N$, the accuracy of the
method is determined by the number of function calls when implemented (proof
in @tbl:MC).

-----------------------------------------------------------------
           500'000 calls     5'000'000 calls    50'000'000 calls
--------- ----------------- ------------------ ------------------
result     1.7166435813      1.7181231109       1.7183387184

$\sigma$   0.0006955691      0.0002200309       0.0000695809
-----------------------------------------------------------------

Table: MC results and errors with different numbers of function
calls. {#tbl:MC}

## Miser

The MISER algorithm is based on recursive stratified sampling.
On each recursion step the integral and the error are estimated using a plain
Monte Carlo algorithm. If the error estimate is larger than the required
accuracy, the integration volume is divided into sub-volumes and the procedure
is recursively applied to sub-volumes.  
This technique aims to reduce the overall integration error by concentrating
integration points in the regions of highest variance.  
The idea of stratified sampling begins with the observation that for two
disjoint regions $a$ and $b$ with Monte Carlo estimates of the integral
$E_a (f)$ and $E_b (f)$ and variances $\sigma_a^2 (f)$ and $\sigma_b^2 (f)$,
the variance $V (f)$ of the combined estimate $E (f)$:

$$
  E (f)= \frac {1}{2} \left( E_a (f) + E_b (f) \right)
$$

is given by,

$$
  V(f) = \frac{\sigma_a^2 (f)}{4 N_a} + \frac{\sigma_b^2 (f)}{4 N_b}
$$

It can be shown that this variance is minimized by distributing the points such that,

$$
  \frac{N_a}{N_a + N_b} = \frac{\sigma_a}{\sigma_a + \sigma_b}
                          \cdot \frac{N_a}{N_a + N_b}
                        = \frac{\sigma_a}{\sigma_a + \sigma b}
$$

Hence the smallest error estimate is obtained by allocating sample points in proportion
to the standard deviation of the function in each sub-region.

---

---------------------------------------------------------
       calls     plain MC        Miser          Vegas    
------------ -------------- -------------- --------------
    500'000   1.7166435813   1.7182850738   1.7182818354 

  5'000'000   1.7181231109   1.7182819143   1.7182818289 

 50'000'000   1.7183387184   1.7182818221   1.7182818285 
---------------------------------------------------------

Table: Results of the three methods. {#tbl:results}

---------------------------------------------------------
       calls     plain MC        Miser          Vegas    
------------ -------------- -------------- --------------
    500'000   0.0006955691   0.0000021829   0.0000000137

  5'000'000   0.0002200309   0.0000001024   0.0000000004

 50'000'000   0.0000695809   0.0000000049   0.0000000000
---------------------------------------------------------

Table: $\sigma$s of the three methods. {#tbl:sigmas}
initial commit 2020-03-06 02:24:32 +01:00			`# Exercize 5`

			`Numerically compute an integral value via Monte Carlo approaches`

			`The integral to be evaluated is the following:`

			`$$`
			`I = \int\limits_0^1 dx \, e(x)`
			`$$`

			`whose exact value is 1.7182818285...`
			`The three most popular MC methods where applied: plain Monte Carlo, Miser and`
			`Vegas (besides this popularity fact, these three method were chosen for being`
			`the only ones implemented in the GSL library).`

			`## Plain Monte Carlo`

			`When the integral $I$ over a volume $V$ of a function $f$ must be evaluated in`
			`a $n-$dimensional space, the simplest MC method approach is to sample $N$`
			`points $x_i$ evenly distributed in $V$ and approx $I$ as:`

			`$$`
			`I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle`
			`$$`

			`with $I_N \rightarrow I$ for $N \rightarrow + \infty$ for the law of large`
			`numbers. Hence, the variance can be merely extimated as:`

			`$$`
			`\sigma^2 =`
			`\frac{1}{N-1} \sum_{i = 1}^N \left( f(x_i) - \langle f \rangle \right)^2`
			`$$`

			`thus, the error on $I_N$ decreases as $1/\sqrt{N}$. Unlike in deterministic`
			`methods, the estimate of the error is not a strict error bound: random sampling`
			`may not uncover all the important features of the integrand that can result in`
			`an underestimate of the error.`
			`Since the proximity of $I_N$ to $I$ is related to $N$, the accuracy of the`
			`method is determined by the number of function calls when implemented (proof`
			`in @tbl:MC).`

			`-----------------------------------------------------------------`
			`500'000 calls 5'000'000 calls 50'000'000 calls`
			`--------- ----------------- ------------------ ------------------`
			`result 1.7166435813 1.7181231109 1.7183387184`

			`$\sigma$ 0.0006955691 0.0002200309 0.0000695809`
			`-----------------------------------------------------------------`

			`Table: MC results and errors with different numbers of function`
			`calls. {#tbl:MC}`

			`## Miser`

			`The MISER algorithm is based on recursive stratified sampling.`
			`On each recursion step the integral and the error are estimated using a plain`
			`Monte Carlo algorithm. If the error estimate is larger than the required`
			`accuracy, the integration volume is divided into sub-volumes and the procedure`
			`is recursively applied to sub-volumes.`
			`This technique aims to reduce the overall integration error by concentrating`
			`integration points in the regions of highest variance.`
			`The idea of stratified sampling begins with the observation that for two`
			`disjoint regions $a$ and $b$ with Monte Carlo estimates of the integral`
			`$E_a (f)$ and $E_b (f)$ and variances $\sigma_a^2 (f)$ and $\sigma_b^2 (f)$,`
			`the variance $V (f)$ of the combined estimate $E (f)$:`

			`$$`
			`E (f)= \frac {1}{2} \left( E_a (f) + E_b (f) \right)`
			`$$`

			`is given by,`

			`$$`
			`V(f) = \frac{\sigma_a^2 (f)}{4 N_a} + \frac{\sigma_b^2 (f)}{4 N_b}`
			`$$`

			`It can be shown that this variance is minimized by distributing the points such that,`

			`$$`
			`\frac{N_a}{N_a + N_b} = \frac{\sigma_a}{\sigma_a + \sigma_b}`
			`\cdot \frac{N_a}{N_a + N_b}`
			`= \frac{\sigma_a}{\sigma_a + \sigma b}`
			`$$`

			`Hence the smallest error estimate is obtained by allocating sample points in proportion`
			`to the standard deviation of the function in each sub-region.`

			`---`

			`---------------------------------------------------------`
			`calls plain MC Miser Vegas`
			`------------ -------------- -------------- --------------`
			`500'000 1.7166435813 1.7182850738 1.7182818354`

			`5'000'000 1.7181231109 1.7182819143 1.7182818289`

			`50'000'000 1.7183387184 1.7182818221 1.7182818285`
			`---------------------------------------------------------`

			`Table: Results of the three methods. {#tbl:results}`

			`---------------------------------------------------------`
			`calls plain MC Miser Vegas`
			`------------ -------------- -------------- --------------`
			`500'000 0.0006955691 0.0000021829 0.0000000137`

			`5'000'000 0.0002200309 0.0000001024 0.0000000004`

			`50'000'000 0.0000695809 0.0000000049 0.0000000000`
			`---------------------------------------------------------`

			`Table: $\sigma$s of the three methods. {#tbl:sigmas}`