ex-6: started writing about the histograms comparison

2020-05-01 23:56:35 +02:00 · 2020-05-01 23:56:35 +02:00 · 110149f709
commit 110149f709
parent 37094d0cf7
7 changed files with 87 additions and 30 deletions
--- a/notes/docs/bibliography.bib
+++ b/notes/docs/bibliography.bib
@ -0,0 +1,7 @@
+@article{cock41,
+  title={The distribution of a product from several sources to numerous localities},
+  author={F. L. Hitchcock},
+  year={2942},
+  journal={Journal of Mathematical Physics},
+  pages={224 - 230}
+}
--- a/notes/docs/bibliography.csl
+++ b/notes/docs/bibliography.csl
@ -0,0 +1,14 @@
+<?xml version="1.0" encoding="utf-8"?>
+<style xmlns="http://purl.org/net/xbiblio/csl" version="1.0" default-locale="en-US">
+  <!-- Elsevier, generated from "elsevier" metadata at https://github.com/citation-style-language/journals -->
+  <info>
+    <title>Chinese Journal of Physics</title>
+    <id>http://www.zotero.org/styles/chinese-journal-of-physics</id>
+    <link href="http://www.zotero.org/styles/chinese-journal-of-physics" rel="self"/>
+    <link href="http://www.zotero.org/styles/elsevier-with-titles" rel="independent-parent"/>
+    <category citation-format="numeric"/>
+    <issn>0577-9073</issn>
+    <updated>2016-07-25T11:35:23+00:00</updated>
+    <rights license="http://creativecommons.org/licenses/by-sa/3.0/">This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License</rights>
+  </info>
+</style>
--- a/notes/sections/0.md
+++ b/notes/sections/0.md
@ -80,4 +80,7 @@ header-includes: |
  \captionsetup{width=11cm}
  \usepackage{stmaryrd}
  ```
+
+bibliography: docs/bibliography.bib
+csl: docs/bibliography.csl
 ---
--- a/notes/sections/1.md
+++ b/notes/sections/1.md
@ -1,4 +1,4 @@
-# Exercise 1
+# Exercise 1 {#sec:Landau}

 ## Random numbers following the Landau distribution

@ -6,7 +6,7 @@ The Landau distribution is a probability density function which can be defined
 as follows:

 $$
-  f(x) = \int \limits_{0}^{+ \infty} dt \, e^{-t log(t) -xt} \sin (\pi t)
+  f(x) = \int \limits_{0}^{+ \infty} dt \, e^{-t \log(t) -xt} \sin (\pi t)
 $$

 ![Landau distribution.](images/landau-small.pdf){width=50%}
@ -18,7 +18,7 @@ was used.
 For the purpose of visualizing the resulting sample, the data was put into
 an histogram and plotted with matplotlib. The result is shown in @fig:landau.

-![Example of N points generated with the `gsl_ran_landau()`
+![Example of N = 10'000 points generated with the `gsl_ran_landau()`
 function and plotted in a 100-bins histogram ranging from -10 to
 80.](images/landau-hist.png){#fig:landau}

@ -41,7 +41,7 @@ $$
 where:

  - $x$ runs over the sample,
-  - $F(x)$ is the Landau cumulative distribution and function
+  - $F(x)$ is the Landau cumulative distribution function,
  - $F_N(x)$ is the empirical cumulative distribution function of the sample.

 If $N$ numbers have been generated, for every point $x$,
--- a/notes/sections/4.md
+++ b/notes/sections/4.md
@ -119,7 +119,7 @@ $$
 $$

 from which, the integral $I$ can now be computed. The edges of the integral
-are fixed bt the fact that the total momentum can not exceed $P_{\text{max}}$:
+are fixed by the fact that the total momentum can not exceed $P_{\text{max}}$:

 $$
  I = \int
@ -218,7 +218,7 @@ $$
  p_h = j \cdot w + \frac{w}{2} = w \left( 1 + \frac{1}{2} \right)
 $$

-The following result was obtained:
+For $p_{\text{max}} = 10$, the following result was obtained:

 ![Histogram of the obtained distribution.](images/dip.pdf)

--- a/notes/sections/6.md
+++ b/notes/sections/6.md
@ -91,9 +91,9 @@ of bins default set $n = 150$. In @fig:original an example is shown.
 ![Example of an intensity histogram.](images/fraun-original.pdf){#fig:original}


-## Gaussian noise convolution {#sec:convolution}
+## Gaussian convolution {#sec:convolution}

-The sample must then be smeared with a Gaussian noise with the aim to recover
+The sample must then be smeared with a Gaussian function with the aim to recover
 the original sample afterwards, implementing a deconvolution routine.  
 For this purpose, a 'kernel' histogram with a odd number $m$ of bins and the
 same bin width of the previous one, but a smaller number of them ($m < n$), was
@ -370,7 +370,7 @@ $P^{\star}$ is the flipped point spread function.
 When implemented, this method results in an easy step-wise routine:

  - create a flipped copy of the kernel;
-  - elect a zero-order estimate for {$c_i$};
+  - choose a zero-order estimate for {$c_i$};
  - compute the convolutions with the method described in @sec:convolution, the
    product and the division at each step;
  - proceed until a given number of reiterations is achieved.
@ -393,27 +393,27 @@ deconvolved with RL is located below.

 As can be seen, increasig the value of $\sigma$ implies a stronger smoothing of
 the curve. The FFT deconvolution process seems not to be affected by $\sigma$
-amplitude changes: it always gives the same outcome, remarkably similar to the
-original signal. The same can't be said about the RL deconvolution, which, on
-the other hand, looks heavily influenced by the variance magnitude: the greater
-$\sigma$, the worse the deconvoluted result. In fact, given the same number of
-steps, the deconvolved signal is always the same 'distance' far form the
-convolved one: if it very smooth, the deconvolved signal is very smooth too and
-if the convolved is less smooth, it is less smooth too.
+amplitude changes: it always gives the same outcome, which is exactly the
+original signal. In fact, the FFT is the analitical result of the deconvolution.
+In the real world, it is unpratical, since signals are inevitably blurred by
+noise.  
+The same can't be said about the RL deconvolution, which, on the other hand,
+looks heavily influenced by the variance magnitude: the greater $\sigma$, the
+worse the deconvoluted result. In fact, given the same number of steps, the
+deconvolved signal is always the same 'distance' far form the convolved one:
+if it very smooth, the deconvolved signal is very smooth too and if the
+convolved is less smooth, it is less smooth too.

 It was also implemented the possibility to add a Poisson noise to the
-convoluted histogram to check weather the deconvolution is affected or not by
-this kind of noise. It was took as an example the case with $\sigma = \Delta
-\theta$. In @fig:poisson the results are shown for both methods when a Poisson
-noise with mean $\mu = 50$ is employed.  
-In both cases, the addition of the Poisson noise seems to affect partially the
-deconvolution. When the FFT method was applied, it adds little spikes nearly
-everywhere on the curve but it is particularly evident on the edges of the
-curve, where the expected data are very small. This is because the technique is
-very accurate and hence returns nearly the exact original data which, in this
-case, is the expected one to which the Poisson noise is added.  
-On the other hand, the Richardson-Lucy routine is less affected by this further
-complication being already inaccurate in itself.
+convolved histogram to check weather the deconvolution is affected or not by
+this kind of interference. It was took as an example the case with $\sigma =
+\Delta \theta$. In @fig:poisson the results are shown for both methods when a
+Poisson noise with mean $\mu = 50$ is employed.  
+In both cases, the addition of the noise seems to partially affect the
+deconvolution. When the FFT method is applied, it adds little spikes nearly
+everywhere on the curve and it is particularly evident on the edges, where the
+expected data are very small. On the other hand, the Richardson-Lucy routine is
+less affected by this further complication.

 <div id="fig:results1">
 ![Convolved signal.](images/fraun-conv-0.05.pdf){width=12cm}
@ -454,3 +454,35 @@ Results for $\sigma = \Delta \theta$, where $\Delta \theta$ is the bin width.

 Results for $\sigma = \Delta \theta$, with Poisson noise.
 </div>
+
+In order to quantify the similarity of the deconvolution outcome with the
+original signal, a null hypotesis test was made up.  
+Likewise in @sec:Landau, the original sample was treated as a population from
+which other samples of the same size were sampled with replacements. For each
+new sample, the earth mover's distance with respect to the original signal was
+computed.
+
+In statistics, the earth mover's distance (EMD) is the measure of distance
+between two probability distributions [@cock41]. Informally, the distributions
+are interpreted as two different ways of piling up a certain amount of dirt over
+a region and the EMD is the minimum cost of turning one pile into the other,
+where the cost is the amount of dirt moved times the distance by which it is
+moved. It is valid only if the two distributions have the same integral, that
+is if the two piles have the same amount of dirt.  
+Computing the EMD is based on a solution of transportation problem. 
+
+\textcolor{red}{earth mover's distance}
+
+In this case, where the EMD must be applied to two histograms, the procedure
+simplifies a lot boiling down to the difference of the comulative functions of
+the two histograms.
+
+These distances were used to build their empirical cumulative distribution.
+
+\textcolor{red}{empirical distribution}
+
+At 95% confidence level, the compatibility of the deconvolved signal with
+the original one cannot be disporoved if its distance from the original signal
+is grater than  \textcolor{red}{value}.
+
+\textcolor{red}{counts}
--- a/notes/todo
+++ b/notes/todo
@ -1,2 +1,3 @@
- rifare tutti i grafici con le scritte enormi
- aggiungere 4 e 5 nel readme
+- cambiare simbolo convoluzione
+- aggiungere citazioni e referenze
+- rifare grafici senza bordino