ex-6: started writing about the histograms comparison

This commit is contained in:
Giù Marcer 2020-05-01 23:56:35 +02:00 committed by rnhmjoj
parent 37094d0cf7
commit 110149f709
7 changed files with 87 additions and 30 deletions

View File

@ -0,0 +1,7 @@
@article{cock41,
title={The distribution of a product from several sources to numerous localities},
author={F. L. Hitchcock},
year={2942},
journal={Journal of Mathematical Physics},
pages={224 - 230}
}

View File

@ -0,0 +1,14 @@
<?xml version="1.0" encoding="utf-8"?>
<style xmlns="http://purl.org/net/xbiblio/csl" version="1.0" default-locale="en-US">
<!-- Elsevier, generated from "elsevier" metadata at https://github.com/citation-style-language/journals -->
<info>
<title>Chinese Journal of Physics</title>
<id>http://www.zotero.org/styles/chinese-journal-of-physics</id>
<link href="http://www.zotero.org/styles/chinese-journal-of-physics" rel="self"/>
<link href="http://www.zotero.org/styles/elsevier-with-titles" rel="independent-parent"/>
<category citation-format="numeric"/>
<issn>0577-9073</issn>
<updated>2016-07-25T11:35:23+00:00</updated>
<rights license="http://creativecommons.org/licenses/by-sa/3.0/">This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License</rights>
</info>
</style>

View File

@ -80,4 +80,7 @@ header-includes: |
\captionsetup{width=11cm}
\usepackage{stmaryrd}
```
bibliography: docs/bibliography.bib
csl: docs/bibliography.csl
---

View File

@ -1,4 +1,4 @@
# Exercise 1
# Exercise 1 {#sec:Landau}
## Random numbers following the Landau distribution
@ -6,7 +6,7 @@ The Landau distribution is a probability density function which can be defined
as follows:
$$
f(x) = \int \limits_{0}^{+ \infty} dt \, e^{-t log(t) -xt} \sin (\pi t)
f(x) = \int \limits_{0}^{+ \infty} dt \, e^{-t \log(t) -xt} \sin (\pi t)
$$
![Landau distribution.](images/landau-small.pdf){width=50%}
@ -18,7 +18,7 @@ was used.
For the purpose of visualizing the resulting sample, the data was put into
an histogram and plotted with matplotlib. The result is shown in @fig:landau.
![Example of N points generated with the `gsl_ran_landau()`
![Example of N = 10'000 points generated with the `gsl_ran_landau()`
function and plotted in a 100-bins histogram ranging from -10 to
80.](images/landau-hist.png){#fig:landau}
@ -41,7 +41,7 @@ $$
where:
- $x$ runs over the sample,
- $F(x)$ is the Landau cumulative distribution and function
- $F(x)$ is the Landau cumulative distribution function,
- $F_N(x)$ is the empirical cumulative distribution function of the sample.
If $N$ numbers have been generated, for every point $x$,

View File

@ -119,7 +119,7 @@ $$
$$
from which, the integral $I$ can now be computed. The edges of the integral
are fixed bt the fact that the total momentum can not exceed $P_{\text{max}}$:
are fixed by the fact that the total momentum can not exceed $P_{\text{max}}$:
$$
I = \int
@ -218,7 +218,7 @@ $$
p_h = j \cdot w + \frac{w}{2} = w \left( 1 + \frac{1}{2} \right)
$$
The following result was obtained:
For $p_{\text{max}} = 10$, the following result was obtained:
![Histogram of the obtained distribution.](images/dip.pdf)

View File

@ -91,9 +91,9 @@ of bins default set $n = 150$. In @fig:original an example is shown.
![Example of an intensity histogram.](images/fraun-original.pdf){#fig:original}
## Gaussian noise convolution {#sec:convolution}
## Gaussian convolution {#sec:convolution}
The sample must then be smeared with a Gaussian noise with the aim to recover
The sample must then be smeared with a Gaussian function with the aim to recover
the original sample afterwards, implementing a deconvolution routine.
For this purpose, a 'kernel' histogram with a odd number $m$ of bins and the
same bin width of the previous one, but a smaller number of them ($m < n$), was
@ -370,7 +370,7 @@ $P^{\star}$ is the flipped point spread function.
When implemented, this method results in an easy step-wise routine:
- create a flipped copy of the kernel;
- elect a zero-order estimate for {$c_i$};
- choose a zero-order estimate for {$c_i$};
- compute the convolutions with the method described in @sec:convolution, the
product and the division at each step;
- proceed until a given number of reiterations is achieved.
@ -393,27 +393,27 @@ deconvolved with RL is located below.
As can be seen, increasig the value of $\sigma$ implies a stronger smoothing of
the curve. The FFT deconvolution process seems not to be affected by $\sigma$
amplitude changes: it always gives the same outcome, remarkably similar to the
original signal. The same can't be said about the RL deconvolution, which, on
the other hand, looks heavily influenced by the variance magnitude: the greater
$\sigma$, the worse the deconvoluted result. In fact, given the same number of
steps, the deconvolved signal is always the same 'distance' far form the
convolved one: if it very smooth, the deconvolved signal is very smooth too and
if the convolved is less smooth, it is less smooth too.
amplitude changes: it always gives the same outcome, which is exactly the
original signal. In fact, the FFT is the analitical result of the deconvolution.
In the real world, it is unpratical, since signals are inevitably blurred by
noise.
The same can't be said about the RL deconvolution, which, on the other hand,
looks heavily influenced by the variance magnitude: the greater $\sigma$, the
worse the deconvoluted result. In fact, given the same number of steps, the
deconvolved signal is always the same 'distance' far form the convolved one:
if it very smooth, the deconvolved signal is very smooth too and if the
convolved is less smooth, it is less smooth too.
It was also implemented the possibility to add a Poisson noise to the
convoluted histogram to check weather the deconvolution is affected or not by
this kind of noise. It was took as an example the case with $\sigma = \Delta
\theta$. In @fig:poisson the results are shown for both methods when a Poisson
noise with mean $\mu = 50$ is employed.
In both cases, the addition of the Poisson noise seems to affect partially the
deconvolution. When the FFT method was applied, it adds little spikes nearly
everywhere on the curve but it is particularly evident on the edges of the
curve, where the expected data are very small. This is because the technique is
very accurate and hence returns nearly the exact original data which, in this
case, is the expected one to which the Poisson noise is added.
On the other hand, the Richardson-Lucy routine is less affected by this further
complication being already inaccurate in itself.
convolved histogram to check weather the deconvolution is affected or not by
this kind of interference. It was took as an example the case with $\sigma =
\Delta \theta$. In @fig:poisson the results are shown for both methods when a
Poisson noise with mean $\mu = 50$ is employed.
In both cases, the addition of the noise seems to partially affect the
deconvolution. When the FFT method is applied, it adds little spikes nearly
everywhere on the curve and it is particularly evident on the edges, where the
expected data are very small. On the other hand, the Richardson-Lucy routine is
less affected by this further complication.
<div id="fig:results1">
![Convolved signal.](images/fraun-conv-0.05.pdf){width=12cm}
@ -454,3 +454,35 @@ Results for $\sigma = \Delta \theta$, where $\Delta \theta$ is the bin width.
Results for $\sigma = \Delta \theta$, with Poisson noise.
</div>
In order to quantify the similarity of the deconvolution outcome with the
original signal, a null hypotesis test was made up.
Likewise in @sec:Landau, the original sample was treated as a population from
which other samples of the same size were sampled with replacements. For each
new sample, the earth mover's distance with respect to the original signal was
computed.
In statistics, the earth mover's distance (EMD) is the measure of distance
between two probability distributions [@cock41]. Informally, the distributions
are interpreted as two different ways of piling up a certain amount of dirt over
a region and the EMD is the minimum cost of turning one pile into the other,
where the cost is the amount of dirt moved times the distance by which it is
moved. It is valid only if the two distributions have the same integral, that
is if the two piles have the same amount of dirt.
Computing the EMD is based on a solution of transportation problem.
\textcolor{red}{earth mover's distance}
In this case, where the EMD must be applied to two histograms, the procedure
simplifies a lot boiling down to the difference of the comulative functions of
the two histograms.
These distances were used to build their empirical cumulative distribution.
\textcolor{red}{empirical distribution}
At 95% confidence level, the compatibility of the deconvolved signal with
the original one cannot be disporoved if its distance from the original signal
is grater than \textcolor{red}{value}.
\textcolor{red}{counts}

View File

@ -1,2 +1,3 @@
- rifare tutti i grafici con le scritte enormi
- aggiungere 4 e 5 nel readme
- cambiare simbolo convoluzione
- aggiungere citazioni e referenze
- rifare grafici senza bordino