diff --git a/notes/sections/5.md b/notes/sections/5.md
index bfa3971..3396008 100644
--- a/notes/sections/5.md
+++ b/notes/sections/5.md
@@ -5,85 +5,255 @@
 The integral to be evaluated is the following:
 
 $$
-  I = \int\limits_0^1 dx \, e(x)
+  I = \int\limits_0^1 dx \, e^x
 $$
 
-whose exact value is 1.7182818285...  
-The three most popular MC methods where applied: plain Monte Carlo, Miser and
-Vegas (besides this popularity fact, these three method were chosen for being
-the only ones implemented in the GSL library).
+\begin{figure}
+\hypertarget{fig:exp}{%
+\centering
+\begin{tikzpicture}
+  \definecolor{cyclamen}{RGB}{146, 24, 43}
+  % Integral
+  \filldraw [cyclamen!15!white, domain=0:5, variable=\x]
+            (0,0) -- plot({\x},{exp(\x/5)}) -- (5,0) -- cycle;
+  \draw [cyclamen] (5,0) -- (5,2.7182818);
+  \node [below] at (5,0) {1};
+  % Axis
+  \draw [thick, <-] (0,4) -- (0,0);
+  \draw [thick, ->] (-2,0) -- (7,0);
+  \node [below right] at (7,0) {$x$};
+  \node [above left] at (0,4) {$e^{x}$};
+  % Plot
+  \draw [domain=-2:7, smooth, variable=\x,
+         cyclamen, ultra thick] plot ({\x},{exp(\x/5)});
+\end{tikzpicture}
+\caption{Plot of the integral to be evaluated.}
+}
+\end{figure}
+
+whose exact value is 1.7182818285...
+
+The three most popular Monte Carlo (MC) methods where applied: plain MC, Miser
+and Vegas. Besides this popularity fact, these three method were chosen for
+being the only ones implemented in the GSL library.
+
 
 ## Plain Monte Carlo
 
-When the integral $I$ over a volume $V$ of a function $f$ must be evaluated in
-a $n-$dimensional space, the simplest MC method approach is to sample $N$
-points $x_i$ evenly distributed in $V$ and approx $I$ as:
+When the integral $I$ over a $n-$dimensional space $\Omega$ of volume $V$ of a
+function $f$ must be evaluated, that is:
+
+$$
+  I = \int\limits_{\Omega} dx \, f(x)
+  \with V = \int\limits_{\Omega} dx
+$$
+
+the simplest MC method approach is to sample $N$ points $x_i$ evenly distributed
+in $V$ and approx $I$ as:
 
 $$
   I \sim I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \langle f \rangle
 $$
 
 with $I_N \rightarrow I$ for $N \rightarrow + \infty$ for the law of large
-numbers. Hence, the variance can be merely extimated as:
+numbers. Hence, the sample variance can be extimated by the sample variance:
 
 $$
-  \sigma^2 =
-  \frac{1}{N-1} \sum_{i = 1}^N \left( f(x_i) - \langle f \rangle \right)^2
+  \sigma^2_f = \frac{1}{N - 1} \sum_{i = 1}^N \left( f(x_i) - \langle f
+  \rangle \right)^2 \et \sigma^2_I = \frac{V^2}{N^2} \sum_{i = 1}^N
+  \sigma^2_f = \frac{V^2}{N} \sigma^2_f
 $$
 
-thus, the error on $I_N$ decreases as $1/\sqrt{N}$. Unlike in deterministic
-methods, the estimate of the error is not a strict error bound: random sampling
-may not uncover all the important features of the integrand that can result in
-an underestimate of the error.  
+Thus, the error decreases as $1/\sqrt{N}$.  
+Unlike in deterministic methods, the estimate of the error is not a strict error
+bound: random sampling may not uncover all the important features of the
+integrand and this can result in an underestimate of the error.
+
+In this case, $f(x) = e^{x}$ and $\Omega = [0,1]$.
+
 Since the proximity of $I_N$ to $I$ is related to $N$, the accuracy of the
-method is determined by the number of function calls when implemented (proof
-in @tbl:MC).
+method is determined by how many points are generated, namely how many function
+calls are exectuted when the method is implemented. In @tbl:MC, the obtained
+results and errors $\sigma$ are shown. The estimated integrals for different
+numbers of calls are compared to the expected value $I$ and the difference
+'diff' between them is given.  
+As can be seen, the MC method tends to underestimate the error for scarse
+function calls. As previously stated, the higher the number of function calls,
+the better the estimation of $I$. A further observation regards the fact that,
+even with $50'000'000$ calls, the $I^{\text{oss}}$ still differs from $I$ at
+the fifth decimal digit.
 
------------------------------------------------------------------
-           500'000 calls     5'000'000 calls    50'000'000 calls
---------- ----------------- ------------------ ------------------
-result     1.7166435813      1.7181231109       1.7183387184
+-------------------------------------------------------------------------
+                   500'000 calls     5'000'000 calls    50'000'000 calls
+----------------- ----------------- ------------------ ------------------
+$I^{\text{oss}}$     1.7166435813      1.7181231109       1.7183387184
 
-$\sigma$   0.0006955691      0.0002200309       0.0000695809
------------------------------------------------------------------
+$\sigma$             0.0006955691      0.0002200309       0.0000695809
 
-Table: MC results and errors with different numbers of function
-calls. {#tbl:MC}
+diff                 0.0016382472      0.0001587176       0.0000568899
+-------------------------------------------------------------------------
 
-## Miser
+Table: MC results with different numbers of function calls. {#tbl:MC}
 
-The MISER algorithm is based on recursive stratified sampling.
-On each recursion step the integral and the error are estimated using a plain
-Monte Carlo algorithm. If the error estimate is larger than the required
-accuracy, the integration volume is divided into sub-volumes and the procedure
-is recursively applied to sub-volumes.  
-This technique aims to reduce the overall integration error by concentrating
-integration points in the regions of highest variance.  
-The idea of stratified sampling begins with the observation that for two
-disjoint regions $a$ and $b$ with Monte Carlo estimates of the integral
-$E_a (f)$ and $E_b (f)$ and variances $\sigma_a^2 (f)$ and $\sigma_b^2 (f)$,
-the variance $V (f)$ of the combined estimate $E (f)$:
+
+## Stratified sampling
+
+In statistics, stratified sampling is a method of sampling from a population
+partitioned into subpopulations. Stratification, indeed, is the process of
+dividing the primary sample into subgroups (strata) before sampling random 
+within each stratum.  
+Given the mean $\bar{x}_i$ and variance ${\sigma^2_x}_i$ of an entity $x$
+sorted with simple random sampling in each strata, such as:
 
 $$
-  E (f)= \frac {1}{2} \left( E_a (f) + E_b (f) \right)
+  \bar{x}_i = \frac{1}{n_i} \sum_j x_j
 $$
 
-is given by,
-
 $$
-  V(f) = \frac{\sigma_a^2 (f)}{4 N_a} + \frac{\sigma_b^2 (f)}{4 N_b}
+  \sigma_i^2 = \frac{1}{n_i - 1} \sum_j \left( \frac{x_j - \bar{x}_i}{n_i}
+               \right)^2
+  \thus
+  {\sigma^2_x}_i = \frac{1}{n_i^2} \sum_j \sigma_i^2 = \frac{\sigma_i^2}{n_i}
 $$
 
-It can be shown that this variance is minimized by distributing the points such that,
+where:
+
+  - $j$ runs over the points $x_j$ sampled in the $i^{\text{th}}$ stratum
+  - $n_i$ is the number of points sorted in it
+  - $\sigma_i^2$ is the variance associated with the $j^{\text{th}}$ point
+
+then the mean $\bar{x}$ and variance $\sigma_x^2$ estimated with stratified
+sampling for the whole population are:
+
+$$
+  \bar{x} = \frac{1}{N} \sum_i N_i \bar{x}_i \et
+  \sigma_x^2 = \sum_i \left( \frac{N_i}{N} \right)^2 {\sigma_x}^2_i
+             = \sum_i \left( \frac{N_i}{N} \right)^2 \frac{\sigma^2_i}{n_i}
+$$
+
+where $i$ runs over the strata, $N_i$ is the weight of the $i^{\text{th}}$
+stratum and $N$ is the sum of all strata weights.
+
+In practical terms, it can produce a weighted mean that has less variability
+than the arithmetic mean of a simple random sample of the whole population. In
+fact, if measurements within strata have lower standard deviation, the final
+result will have a smaller error in estimation with respect to the one otherwise
+obtained with simple sampling.  
+For this reason, stratified sampling is used as a method of variance reduction
+when MC methods are used to estimate population statistics from a known
+population.
+
+**MISER**
+
+The MISER technique aims to reduce the integration error through the use of
+recursive stratified sampling.  
+Consider two disjoint regions $a$ and $b$ with volumes $V_a$ and $V_b$ and Monte
+Carlo estimates $I_a = V_a \cdot \langle f \rangle_a$ and $I_b = V_b \cdot
+\langle f \rangle_b$ of the integrals, where $\langle f \rangle_a$ and $\langle
+f \rangle_b$ are the means of $f$ of the points sorted in those regions, and
+variances $\sigma_a^2$ and $\sigma_b^2$ of those points. If the weights $N_a$
+and $N_b$ of $I_a$ and $I_b$ are unitary, then the variance $\sigma_I^2$ of the
+combined estimate $I$:
+
+\textcolor{red}{QUI}
+
+$$
+  I = \frac{1}{2} (I_a + I_b)
+$$
+
+is given by:
+
+$$
+  \sigma_I^2 = \frac{\sigma_a^2}{N_a} + \frac{\sigma_b^2}{N_b}
+$$
+
+It can be shown that this variance is minimized by distributing the points such
+that:
 
 $$
   \frac{N_a}{N_a + N_b} = \frac{\sigma_a}{\sigma_a + \sigma_b}
-                          \cdot \frac{N_a}{N_a + N_b}
-                        = \frac{\sigma_a}{\sigma_a + \sigma b}
 $$
 
-Hence the smallest error estimate is obtained by allocating sample points in proportion
-to the standard deviation of the function in each sub-region.
+Hence, the smallest error estimate is obtained by allocating sample points in
+proportion to the standard deviation of the function in each sub-region.
+
+such that $a \cup b = \Omega$
+
+When implemented, MISER is in fact a recursive method. With a given step, all
+the possible bisections are tested and the one which minimizes the combined
+variance of the two sub-regions is selected. The same procedure is then repeated
+recursively for each of the two half-spaces from the best bisection. At each
+recursion step, the integral and the error are estimated using a plain Monte
+Carlo algorithm.  
+After a given number of calls, the final individual values and their error
+estimates are then combined upwards to give an overall result and an estimate of
+its error.
+
+Results for this particular sample are shown in @tbl:MISER.
+
+-------------------------------------------------------------------------
+                   500'000 calls     5'000'000 calls    50'000'000 calls
+----------------- ----------------- ------------------ ------------------
+$I^{\text{oss}}$     1.7182850738      1.7182819143       1.7182818221
+
+$\sigma$             0.0000021829      0.0000001024       0.0000000049
+
+diff                 0.0000032453      0.0000000858       000000000064
+-------------------------------------------------------------------------
+
+Table: MISER results with different numbers of function calls. {#tbl:MISER}
+
+The error, altough it lies always in the same order of magnitude of diff, seems
+to seesaw around the correct value as $N$ varies.
+
+
+## VEGAS \textcolor{red}{WIP}
+
+The VEGAS algorithm is based on importance sampling. It samples points from the
+probability distribution described by the function $f$, so that the points are
+concentrated in the regions that make the largest contribution to the integral.
+
+In general, if the MC integral of $f$ is sampled with points distributed
+according to a probability distribution $g$, the following estimate of the integral
+is obtained:
+
+$$
+  E (f|g \, , \, N) \with \sigma^2(f|g \, , \, N)
+$$
+
+If the probability distribution is chosen as $g = f$, it can be shown that the
+variance vanishes, and the error in the estimate will therefore be zero.  
+In practice, it is impossible to sample points from the exact distribution: only
+a good approximation can be achieved. In GSL, the VEGAS algorithm approximates
+the distribution by histogramming the function $f$ in different subregions. Each
+histogram is used to define a sampling distribution for the next pass, which
+consists in doing the same thing recorsively: this procedure converges
+asymptotically to the desired distribution.
+
+In order to avoid the number of histogram bins growing like $K^d$, the
+probability distribution is approximated by a separable function:
+
+$$
+  f (x_1, x_2, \ldots) = f_1(x_1) f_2(x_2) \ldots
+$$
+
+so that the number of bins required is only $Kd$. This is equivalent to locating
+the peaks of the function from the projections of the integrand onto the
+coordinate axes. The efficiency of VEGAS depends on the validity of this
+assumption. It is most efficient when the peaks of the integrand are
+well-localized. If an integrand can be rewritten in a form which is
+approximately separable this will increase the efficiency of integration with
+VEGAS.
+
+VEGAS incorporates a number of additional features, and combines both stratified
+sampling and importance sampling. The integration region is divided into a number
+of “boxes”, with each box getting a fixed number of points (the goal is 2). Each
+box can then have a fractional number of bins, but if the ratio of bins-per-box is
+less than two, Vegas switches to a kind variance reduction (rather than importance
+sampling).
+
+
+
 
 ---