ex-6: section results completed

2020-05-21 18:50:21 +02:00 · 2020-05-21 18:50:21 +02:00 · cc8a5f753a
commit cc8a5f753a
parent 5634f2f418
11 changed files with 247 additions and 80 deletions
--- a/notes/images/6-emd-hist-0.pdf
+++ b/notes/images/6-emd-hist-0.pdf
--- a/notes/images/6-noise-emd-0.005.pdf
+++ b/notes/images/6-noise-emd-0.005.pdf
--- a/notes/images/6-noise-emd-0.01.pdf
+++ b/notes/images/6-noise-emd-0.01.pdf
--- a/notes/images/6-noise-emd-0.05.pdf
+++ b/notes/images/6-noise-emd-0.05.pdf
--- a/notes/images/6-original.pdf
+++ b/notes/images/6-original.pdf
--- a/notes/images/6-smoothed.pdf
+++ b/notes/images/6-smoothed.pdf
--- a/notes/sections/1.md
+++ b/notes/sections/1.md
@ -304,7 +304,7 @@ $$

 ![Example of a Moyal distribution density obtained by the KDE method. The rug
  plot shows the original sample used in the reconstruction.
-](images/landau-kde.pdf)
+](images/1-landau-kde.pdf)

 On the other hand, obtaining a good estimate of the FWHM from a sample is much
 more difficult. In principle, it could be measured by binning the data and
--- a/notes/sections/4.md
+++ b/notes/sections/4.md
@ -181,7 +181,7 @@ where 'floor' is the function which gives the bigger integer smaller than its
 argument and the bins are counted starting from zero.  
 Then, a vector in which the $j^{\text{th}}$ entry contains both the sum $S_j$
 of all the $|P_v|$s relative to each $P_h$ fallen into the $j^{\text{th}}$ bin
-itself and the number num$_j$ of the bin entries was reiteratively updated. At
+itself and the number num$_j$ of the bin entries was iteratively updated. At
 the end, the average value of $|P_v|_j$ was computed as $S_j / \text{num}_j$.  
 For the sake of clarity, for each sampled couple the procedure is the
 following. At first $S_j = 0 \wedge \text{num}_j = 0 \, \forall \, j$, then:
--- a/notes/sections/5.md
+++ b/notes/sections/5.md
@ -68,7 +68,7 @@ In this case $f(x) = e^{x}$ and $\Omega = [0,1]$, hence $V = 1$.

 Since the proximity of $I_N$ to $I$ is related to $N$, the accuracy of the
 method lies in how many points are generated, namely how many function calls
-are executed when the reiterative method is implemented. In @fig:MC and
+are executed when the iterative method is implemented. In @fig:MC and
@fig:MI, results obtained with the plain MC method are shown in red. In
@tbl:MC, some of them are listed: the estimated integrals $I^{\text{oss}}$ are
 compared to the expected value $I$ and the differences diff between them are
@ -333,7 +333,7 @@ the largest contribution to the integral.
 As stated before, in practice it is impossible to sample points from the best
 distribution $P^{(y)}$: only a good approximation can be achieved. In GSL, the
 VEGAS algorithm approximates the distribution by histogramming the function $f$
-in different subregions with a reiterative method [@lepage78], namely:
+in different subregions with a iterative method [@lepage78], namely:

  - a fixed number of points (function calls) is evenly generated in the whole
    region;
@ -371,7 +371,7 @@ where $I_i$ and $\sigma_i$ are are the integral and standard deviation
 estimated in each interval $\Delta x_i$ of the last iteration using the plain
 MC technique.  
 For the results to be reliable, the chi-squared per degree of freedom
-$\chi_r^2$ must be consistent with 1. While performing the reiterations, if
+$\chi_r^2$ must be consistent with 1. While performing the eiterations, if
 the $\chi_r^2$ value exceed 1.5, the cycle breaks.

 Clearly, once again a better estimation is achieved with a greater number of
--- a/notes/sections/6.md
+++ b/notes/sections/6.md
@ -85,21 +85,21 @@ omitted:

 The so obtained sample was binned and stored in a histogram with a customizable
 number $n$ of bins (default set $n = 150$) ranging from $\theta = 0$ to $\theta
-= \pi/2$ bacause of the system symmetry. In @fig:original an example is shown.
+= \pi/2$ because of the system symmetry. In @fig:original an example is shown.

 ![Example of intensity histogram.](images/6-original.pdf){#fig:original}


 ## Gaussian convolution {#sec:convolution}

-In order to simulate the instrumentation response, the sample was then to be
-smeared with a Gaussian function with the aim to recover the original sample
-afterwards, implementing a deconvolution routine.  
+In order to simulate the instrumentation response, the sample was then smeared
+with a Gaussian function with the aim to recover the original sample afterwards,
+implementing a deconvolution routine.  
 For this purpose, a 'kernel' histogram with an even number $m$ of bins and the
 same bin width of the previous one, but a smaller number of them ($m \sim 6\%
-\, n$), was created according to a Gaussian distribution with mean $\mu$,
+\, n$), was generated according to a Gaussian distribution with mean $\mu = 0$
 and variance $\sigma$. The reason why the kernel was set this way will be
-descussed lately.  
+discussed shortly.  
 Then, the original histogram was convolved with the kernel in order to obtain
 the smeared signal. As an example, the result obtained for $\sigma = \Delta
 \theta$, where $\Delta \theta$ is the bin width, is shown in @fig:convolved.
@ -121,11 +121,11 @@ down to an element wise product between $s$ and the flipped histogram of $k$
 histograms. Namely, if $c_i$ is the $i^{\text{th}}$ bin of the convolved
 histogram:
 $$
-  c_i = \sum_j k_j s_{i - j} = \sum_{j'} k_{m - j'} s_{i - m + j'}
-  \with j' = m - j
+  c_i = \sum_{j = 0}^{m - 1} k_j s_{i - j} 
+  = \sum_{j' = m - 1}^{0} k_{m - 1 - j'} s_{i - m + 1 + j'}
+  \with j' = m - 1 - j
 $$

-where $j$ runs over the bins of the kernel.  
 For a better understanding, see @fig:dot_conv: the third histogram turns out
 with $n + m - 1$ bins, a number greater than the original one.

@ -187,8 +187,8 @@ with $n + m - 1$ bins, a number greater than the original one.
  % nodes
  \node [above] at (2.25,-5.5) {$c_i$};
  \node [above] at (3.25,0)    {$s_i$};
-  \node [above] at (1.95,0)    {$s_{i-3}$};
-  \node [below] at (1.75,-1)   {$k_3$};
+  \node [above] at (1.95,0)    {$s_{i-j}$};
+  \node [below] at (1.75,-1)   {$k_j$};
 \end{tikzpicture}
 \caption{Element wise product as a step of the convolution between the original signal
         (above) and the kernel (center). The final result is the lower
@ -210,13 +210,13 @@ $$

 where $\hat{F}[\quad]$ stands for the Fourier transform of its argument.  
 Being the histogram a discrete set of data, the Discrete Fourier Transform (DFT)
-was appied. When dealing with arrays of discrete values, the theorems still
-holds if the two arrays have the same lenght and a cyclical convolution is
+was applied. When dealing with arrays of discrete values, the theorem still
+holds if the two arrays have the same length and a cyclical convolution is
 applied. For this reason the kernel was 0-padded in order to make it the same
-lenght of the original signal. Besides, the 0-padding allows to avoid unpleasant
+length of the original signal. Besides, the 0-padding allows to avoid unpleasant
 side effects due to the cyclical convolution.  
-In order to accomplish this procedure, every histogram was transformed into a
-vector. The implementation lies in the computation of the Fourier trasform of
+In order to accomplish this procedure, both histograms were transformed into
+vectors. The implementation lies in the computation of the Fourier transform of
 the smeared signal and the kernel, the ratio between their transforms and the
 anti-transformation of the result:
 $$
@ -225,16 +225,17 @@ $$
 $$

 The FFT are efficient algorithms for calculating the DFT. Given a set of $n$
-values {$z_i$}, each one is transformed into:
+values {$z_j$}, each one is transformed into:
 $$
  x_j = \sum_{k=0}^{n-1} z_k \exp \left( - \frac{2 \pi i j k}{n} \right)
 $$

+where $i$ is the imaginary unit.  
 The evaluation of the DFT is a matrix-vector multiplication $W \vec{z}$. A
 general matrix-vector multiplication takes $O(n^2)$ operations. FFT algorithms,
-instad, use a divide-and-conquer strategy to factorize the matrix into smaller
-sub-matrices. If $n$ can be factorized into a product of integers $n_1$, $n_2
-\ldots n_m$, then the DFT can be computed in $O(n \sum n_i) < O(n^2)$
+instead, use a *divide-and-conquer* strategy to factorize the matrix into
+smaller sub-matrices. If $n$ can be factorized into a product of integers $n_1$,
+$n_2 \ldots n_m$, then the DFT can be computed in $O(n \sum n_i) < O(n^2)$
 operations, hence the name.  
 The inverse Fourier transform is thereby defined as:
 $$
@ -243,13 +244,13 @@ $$
 $$

 In GSL, `gsl_fft_complex_forward()` and `gsl_fft_complex_inverse()` are 
-functions which allow to compute the foreward and inverse transform,
+functions which allow to compute the forward and inverse transform,
 respectively.  
 The inputs and outputs for the complex FFT routines are packed arrays of
-floating point numbers. In a packed array the real and imaginary parts of
-each complex number are placed in alternate neighboring elements. In this
-special case, the sequence of values to be transformed is made of real numbers,
-but the Fourier transform is not real: it is a complex sequence wich satisfies:
+floating point numbers. In a packed array, the real and imaginary parts of each
+complex number are placed in alternate neighboring elements. In this special
+case, the sequence of values to be transformed is made of real numbers, hence
+Fourier transform is a complex sequence which satisfies:
 $$
  z_k = z^*_{n-k}
 $$
@ -260,7 +261,7 @@ forward transform (from real to half-complex) and inverse transform (from
 half-complex to real). As a consequence, the routines are divided into two sets:
 `gsl_fft_real` and `gsl_fft_halfcomplex`. The symmetry of the half-complex
 sequence requires only half of the complex numbers in the output to be stored.
-This works for all lengths: when the length is even, the middle value is real.
+This works for all lengths: when the length is odd, the middle value is real.
 Thus, only $n$ real numbers are required to store the half-complex sequence
 (half for the real part and half for the imaginary).

@ -293,7 +294,7 @@ Thus, only $n$ real numbers are required to store the half-complex sequence
  \end{scope}
 \end{tikzpicture}
 \caption{The histogram on the right shows how the real numbers histogram on the
-         left is handled by the dedicated GSL functions`.}\label{fig:reorder}
+         left is handled by the dedicated GSL functions.}\label{fig:reorder}
 }
 \end{figure}

@ -305,7 +306,7 @@ the array up to the middle and the negative backwards from the end of the array
 Whilst do not matters if the convolved histogram has positive or negative
 values, the kernel must be centered in zero in order to compute a correct
 convolution. This requires the kernel to be made of an ever number of bins
-in order to be possible to cut it into two same-lenght halves.
+in order to be possible to cut it into two same-length halves.

 When $\hat{F}[s * k]$ and $\hat{F}[k]$ are computed, they are given in the
 half-complex GSL format and their normal format must be restored in order to
@ -318,24 +319,24 @@ code.

 At the end, the external bins which exceed the original signal size are cut
 away in order to restore the original number of bins $n$. Results will be
-discussed in @sec:conv_Results.
+discussed in @sec:conv_results.


 ## Unfolding with Richardson-Lucy

-The Richardson–Lucy (RL) deconvolution is an iterative procedure tipically used
+The Richardson–Lucy (RL) deconvolution is an iterative procedure typically used
 for recovering an image that has been blurred by a known 'point spread
 function'.

-Consider the problem of estimating the frequeny distribution $f(\xi)$ of a
+Consider the problem of estimating the frequency distribution $f(\xi)$ of a
 variable $\xi$ when the available measure is a sample {$x_i$} of points
-dronwn not by $f(x)$ but by another function $\phi(x)$ such that:
+drown not by $f(x)$ but by another function $\phi(x)$ such that:
 $$
  \phi(x) = \int d\xi \, f(\xi) P(x | \xi)
 $$ {#eq:conv}

 where $P(x | \xi) \, d\xi$ is the probability (presumed known) that $x$ falls
-in the interval $(x, x + dx)$ when $\xi = \xi$. If the so-colled point spread
+in the interval $(x, x + dx)$ when $\xi = \xi$. If the so-called point spread
 function $P(x | \xi)$ follows a normal distribution with variance $\sigma$,
 namely:
 $$
@ -371,12 +372,12 @@ $$
  f(\xi) = \int dx \, \phi(x) Q(\xi | x)
 $$ {#eq:second}

-Since $Q (\xi | x)$ depends on $f(\xi)$, @eq:second suggests a reiterative
+Since $Q (\xi | x)$ depends on $f(\xi)$, @eq:second suggests an iterative
 procedure for generating estimates of $f(\xi)$. With a guess for $f(\xi)$ and
 a known $P(x | \xi)$, @eq:first can be used to calculate and estimate for
 $Q (\xi | x)$. Then, taking the hint provided by @eq:second, an improved
-estimate for $f (\xi)$ is generated, using the observed sample {$x_i$} to give
-an approximation for $\phi$.  
+estimate for $f (\xi)$ can be generated, using the observed sample {$x_i$} to
+give an approximation for $\phi$.  
 Thus, if $f^t$ is the $t^{\text{th}}$ estimate, the $t^{\text{th + 1}}$ is:
 $$
  f^{t + 1}(\xi) = \int dx \, \phi(x) Q^t(\xi | x)
@ -401,15 +402,15 @@ $$
 where $P^{\star}$ is the flipped point spread function [@lucy74].

 In this special case, the Gaussian kernel which was convolved with the original
-histogram stands for the point spread function and, dealing with discrete
-values, the division and multiplication are element wise and the convolution is
-to be carried out as described in @sec:convolution.  
+histogram stands for the point spread function. Dealing with discrete values,
+the division and multiplication are element wise and the convolution is to be
+carried out as described in @sec:convolution.  
 When implemented, this method results in an easy step-wise routine:

-  - create a flipped copy of the kernel;
  - choose a zero-order estimate for {$f(\xi)$};
+  - create a flipped copy of the kernel;
  - compute the convolutions, the product and the division at each step;
-  - proceed until a given number of reiterations is achieved.
+  - proceed until a given number $r$ of iterations is reached.

 In this case, the zero-order was set $f(\xi) = 0.5 \, \forall \, \xi$. Different
 number of iterations where tested. Results are discussed in
@ -450,10 +451,9 @@ $$
  W (P, Q, F) = \sum_{i = 1}^m \sum_{j = 1}^n f_{ij} d_{ij}
 $$

-The fact is that the $Q$ region is to be considerd empty at the beginning: the
-'dirt' present in $P$ must be moved to $Q$ in order to reach the same
-distribution as close as possible. Namely, the following constraints must be
-satisfied:
+The $Q$ region is to be considered empty at the beginning: the 'dirt' present in
+$P$ must be moved to $Q$ in order to reach the same distribution as close as
+possible. Namely, the following constraints must be satisfied:

 \begin{align*}
  &\text{1.} \hspace{20pt} f_{ij} \ge 0 \hspace{15pt}
@ -474,7 +474,7 @@ second limits the amount of dirt moved by each position in $P$ in order to not
 exceed the available quantity; the third sets a limit to the dirt moved to each
 position in $Q$ in order to not exceed the required quantity and the last one
 forces to move the maximum amount of supplies possible: either all the dirt
-present in $P$ has be moved, or the $Q$ distibution is obtained.  
+present in $P$ has been moved or the $Q$ distribution has been obtained.  
 The total moved amount is the total flow. If the two distributions have the
 same amount of dirt, hence all the dirt present in $P$ is necessarily moved to
 $Q$ and the flow equals the total amount of available dirt.
@ -487,9 +487,9 @@ $$
                  {\sum_{i = 1}^m \sum_{j=1}^n f_{ij}}
 $$

-In this case, where the EMD has to be applied to two same-lenght histograms, the
-procedure simplifies a lot. By representing both histograms with two vectors $u$
-and $v$, the equation above boils down to [@ramdas17]:
+In this case, where the EMD is to be measured between two same-length
+histograms, the procedure simplifies a lot. By representing both histograms
+with two vectors $u$ and $v$, the equation above boils down to [@ramdas17]:

 $$
  \text{EMD} (u, v) = \sum_i |U_i - V_i|
@ -497,7 +497,7 @@ $$

 where the sum runs over the entries of the vectors $U$ and $V$, which are the
 cumulative vectors of the histograms. In the code, the following equivalent
-recursive routine was implemented.
+iterative routine was implemented.

 $$
  \text{EMD} (u, v) = \sum_i |\text{EMD}_i| \with
@ -526,35 +526,202 @@ In fact:
 \end{align*}

 This simple formula enabled comparisons to be made between a great number of
-results.
+results.  
+In order to make the code more flexible, the data were normalized before
+computing the EMD: in doing so, it is possible to compare even samples with a
+different number of points.
+

 ## Results comparison {#sec:conv_results}

-As can be seen, increasing the value of $\sigma$ implies a stronger smoothing of
-the curve. The FFT deconvolution process seems not to be affected by $\sigma$
-amplitude changes: it always gives the same outcome, which is exactly the
-original signal. In fact, the FFT is the analitical result of the deconvolution.
-In the real world, it is unpratical, since signals are inevitably blurred by
-noise.  
-The same can't be said about the RL deconvolution, which, on the other hand,
-looks heavily influenced by the variance magnitude: the greater $\sigma$, the
-worse the deconvolved result. In fact, given the same number of steps, the
-deconvolved signal is always the same 'distance' far form the convolved one:
-if it very smooth, the deconvolved signal is very smooth too and if the
-convolved is less smooth, it is less smooth too.

-The original signal is shown below for convenience.
+### Noiseless results {#sec:noiseless}
+
+Along with the analysis of the results obtained varying the convolved Gaussian
+width $\sigma$, the possibility to add a Gaussian noise to the convolved
+histogram was also implemented to check weather the deconvolution is affected
+or not by this kind of interference. This approach is described in the next
+subsection, while the noiseless results are given in this one.
+
+The two methods were compared for three different values of $\sigma$:
+$$
+   \sigma = 0.1 \, \Delta \theta \et
+   \sigma = 0.5 \, \Delta \theta \et
+   \sigma = \Delta \theta
+$$
+
+Since the RL method depends on the number $r$ of performed rounds, in order to
+find out how many of them it was sufficient or necessary to compute, the earth
+mover's distance between the deconvolved signal and the original one was
+measured for different $r$s for each of the three tested values of $\sigma$.  
+To achieve this goal, a number of 1000 experiments (default and customizable
+value) were simulated and, for each of them, the original signal was convolved
+with the kernel, the appropriate $\sigma$ value set, and then deconvolved with
+the RL algorithm with a given $r$ and the EMD was measured. Then, an average of
+the so-obtained EMDs was computed together with the standard deviation. This
+procedure was repeated for a few tens of different $r$s till a flattening or a
+minimum of the curve became evident. Results in @fig:rounds-noiseless.
+
+The plots in @fig:rless-0.1 show the average (red) and standard deviation (grey)
+of the measured EMD for $\sigma = 0.1 \, \Delta \theta$. The number of
+iterations does not affect the quality of the outcome (those fluctuations are
+merely a fact of floating-points precision) and the best result is obtained
+for $r = 2$, meaning that the convergence of the RL algorithm is really fast and
+this is due to the fact that the histogram was modified pretty poorly. In
+@fig:rless-0.5, the curve starts to flatten at about 10 rounds, whereas in
+@fig:rless-1 a minimum occurs around \SI{5e3}{} rounds, meaning that, whit such a
+large kernel, the convergence is very slow, even if the best results are close
+to the one found for $\sigma = 0.5$.
+The following $r$s were chosen as the most fitted:
+\begin{align*}
+  \sigma = 0.1 \, \Delta \theta &\thus n^{\text{best}} = 2    \\
+  \sigma = 0.5 \, \Delta \theta &\thus n^{\text{best}} = 10   \\
+  \sigma = 1   \, \Delta \theta &\thus n^{\text{best}} = \SI{5e3}{}
+\end{align*}
+
+Note the difference between  @fig:rless-0.1 and the plots resulting from $\sigma =
+0.5 \, \Delta \theta$ and  $\sigma = \, \Delta \theta$ as regards the order of
+magnitude: the RL deconvolution is heavily influenced by the variance magnitude:
+the greater $\sigma$, the worse the deconvolved result.  
+On the other hand, the FFT deconvolution procedure is not affected by $\sigma$
+amplitude changes: it always gives the same outcome, which would be exactly the
+original signal, if the floating point precision would not affect the result. In
+fact, the FFT is the analytical result of the deconvolution.
+
+<div id="fig:rounds-noiseless">
+  ![](images/6-nonoise-rounds-0.1.pdf){#fig:rless-0.1}
+                             
+  ![](images/6-nonoise-rounds-0.5.pdf){#fig:rless-0.5}
+                             
+  ![](images/6-nonoise-rounds-1.pdf){#fig:rless-1}
+
+EMD as a function of RL rounds for different kernel $\sigma$ values. The
+average is shown in red and the standard deviation in grey. Noiseless results.
+</div>
+
+<div id="fig:emd-noiseless">
+  ![$\sigma = 0.1 \, \Delta \theta$](images/6-nonoise-emd-0.1.pdf){#fig:eless-0.1}
+
+  ![$\sigma = 0.5 \, \Delta \theta$](images/6-nonoise-emd-0.5.pdf){#fig:eless-0.5}
+
+  ![$\sigma = \Delta \theta$](images/6-nonoise-emd-1.pdf){#fig:eless-1}
+
+EMD distributions for different kernel $\sigma$ values. The plots on the left
+show the results for the FFT deconvolution, the central column the results for
+the RL deconvolution and the third one shows the EMD for the convolved signal.
+Noiseless results.
+</div>
+
+For this reason, the EMD obtained with the FFT can be used as a reference point
+against which to compare the EMDs measured with RL.  
+As described above, for a given $r$, a thousands of experiments were simulated:
+for each of this simulations, an EMD was computed. Besides computing their
+average and standard deviations, those values were used to build histograms
+showing the EMD distribution.  
+Once the most fitted numbers of rounds $r^{\text{best}}$ were found, their
+histograms were compared to the histograms of the FFT results, started from the
+same convolved signals, and the EDM of the convolved signals themselves, in
+order to check if an improvement was truly achieved. Results are shown in
+@fig:emd-noiseless.
+
+As expected, the FFT results are always of the same order of magnitude,
+\SI{1e-15}{}, independently from the kernel width, whereas the RL deconvolution
+results change a lot, ranging from \SI{1e-16}{} for $\sigma = 0.1 \, \Delta
+\theta$ to \SI{1e-4}{} for $\sigma = \Delta \theta$.  
+The first result was quite unexpected: for very low values of $\sigma$, the RL
+routine gives better results with respect to the FFT. This is because of the
+math which lies beneath the two methods: apparently, the RL technique is less
+subject to round-off errors.  
+Then, as regards the comparison with the convolved signal, which shows always
+an EMD of \SI{1e-2}{}, both RL and FFT always return results closer to the
+original signal, meaning that the deconvolution is working.


+### Noisy results
+
+In order to observe the effect of the Gaussian noise upon these two
+deconvolution methods, a certain value of $\sigma$ ($\sigma = 0.8 \, \Delta
+\theta$) was arbitrary chosen. The noise was then applied to the convolved
+histogram as follow.
+
+![Example of Noisy histogram,
+  $\sigma_N = 0.05$.](images/6-noisy.pdf){#fig:noisy}
+
+<div id="fig:rounds-noise">
+  ![](images/6-noise-rounds-0.005.pdf){#fig:rnoise-0.005}
+                           
+  ![](images/6-noise-rounds-0.01.pdf){#fig:rnoise-0.01}
+                           
+  ![](images/6-noise-rounds-0.05.pdf){#fig:rnoise-0.05}
+
+EMD as a function of RL rounds for different noise $\sigma_N$ values with the
+kernel $\sigma = 0.8 \Delta \theta$. The average is shown in red and the
+standard deviation in grey. Noisy results.
+</div>
+
+<div id="fig:emd-noisy">
+  ![$\sigma_N = 0.005$](images/6-noise-emd-0.005.pdf){#fig:enoise-0.005}
+
+  ![$\sigma_N = 0.01$](images/6-noise-emd-0.01.pdf){#fig:enoise-0.01}
+
+  ![$\sigma_N = 0.05$](images/6-noise-emd-0.05.pdf){#fig:enoise-0.05}
+
+EMD distributions for different noise $\sigma_N$ values.  The plots on the left
+show the results for the FFT deconvolution, the central column the results for
+the RL deconvolution and the third one shows the EMD for the convolved signal.
+Noisy results.
+</div>
+
+For each bin, once the convolved histogram was computed, a value $v_N$ was
+randomly sampled from a Gaussian distribution with standard deviation
+$\sigma_N$, and the value $v_n \cdot b$ was added to the bin itself, where $b$
+is the content of the bin. An example with $\sigma_N = 0.05$ of the new
+histogram is shown in @fig:noisy.  
+The following three values of $\sigma_N$ were tested:
+$$
+   \sigma_N = 0.005 \et
+   \sigma_N = 0.01 \et
+   \sigma_N = 0.05 
+$$
+
+The same procedure followed in @sec:noiseless was then repeated for noisy
+signals. Hence, in @fig:rounds-noise the EMD as a function of the RL rounds is
+shown, this time varying $\sigma_N$ and keeping $\sigma = 0.8 \, \Delta \theta$
+constant.  
+In @fig:rnoise-0.005, the flattening is achieved around $r = 20$ and in
+@fig:rnoise-0.01 it is sufficient $\sim r = 15$. When the noise becomes too
+high, on the other hand, the more $r$ increases, the more the algorithm becomes
+inefficient, requiring a very small number of iterations.  
+Hence, the most fitting values were chosen as:
+\begin{align*}
+  \sigma_N = 0.005 &\thus r^{\text{best}} = 20   \\
+  \sigma_N = 0.01  &\thus r^{\text{best}} = 15   \\
+  \sigma_N = 0.05  &\thus r^{\text{best}} = 1
+\end{align*}
+
+As regards the order of magnitude, as expected, the more $\sigma_n$ increases,
+the more the EMD rises, ranging from $\sim$ \SI{2e-4}{} in @fig:rnoise-0.005
+to $\sim$ \SI{1.5e-3}{} in @fig:rnoise-0.05.  
+Since the FFT is no more able to return the original signal as close as before,
+it is no more assumed to be a reference point. In fact, as shown in @fig:noisy,
+the FFT algorithm, when dealing with noisy signals whose noise shape is
+unknown, is as efficient as the RL, since they share the same order of magnitude
+regarding the EMD.  
+An increasing noise entails the shift of the convolved histogram to slightly
+greater values, still remaining in the same order of magnitude of the noiseless
+signals. However, the deconvolution still works, being the EMD of the convolved
+histogram an order of magnitude greater than the worst obtained with both
+methods.
+
+In conclusion, when dealing with smeared signals with a known point spread
+function, the FFT proved to be the best deconvolution method, restoring
+perfectly the original signal to the extent permitted by the floating point
+precision. When the point spread function is particularly small, this precision
+problem is partially solved by the RL deconvolution process, which employs a
+more stable math.  
+However, in the real world signals are inevitably blurred by unknown-shpaed
+noise. When the kernel is not known a-priori, either of them turns out to be as
+good as the FFT in the aforementioned situation: only a poor approximation of
+the original signal can be derived.

-It was also implemented the possibility to add a Poisson noise to the
-convolved histogram to check weather the deconvolution is affected or not by
-this kind of interference. It was took as an example the case with $\sigma =
-\Delta \theta$. In @fig:poisson the results are shown for both methods when a
-Poisson noise with mean $\mu = 50$ is employed.  
-In both cases, the addition of the noise seems to partially affect the
-deconvolution. When the FFT method is applied, it adds little spikes nearly
-everywhere on the curve and it is particularly evident on the edges, where the
-expected data are very small. On the other hand, the Richardson-Lucy routine is
-less affected by this further complication.

--- a/notes/sections/7.md
+++ b/notes/sections/7.md
@ -239,7 +239,7 @@ $$ {#eq:perc}

 The training was performed as follow. Initial values were set as $w = (0,0)$ and 
 $b = 0$. From  these, the perceptron starts to improve their estimations. The
-sample was passed point by point into a reiterative procedure a grand total of
+sample was passed point by point into a iterative procedure a grand total of
 $N_c$ calls: each time, the projection $w \cdot x$ of the point was computed
 and then the variable $\Delta$ was defined as: