ex-6: review

This commit is contained in:
Michele Guerini Rocco 2020-06-01 02:07:39 +02:00
parent 4e242b3463
commit 8f847117ac

View File

@ -4,21 +4,20 @@
The diffraction of a plane wave through a round slit is to be simulated by The diffraction of a plane wave through a round slit is to be simulated by
generating $N =$ 50'000 points according to the intensity distribution generating $N =$ 50'000 points according to the intensity distribution
$I(\theta)$ [@hecht02] on a screen at a great distance $L$ from the slit itself $I(\theta)$ [@hecht02] on a screen at a great distance from the slit itself
(see @fig:slit): (see @fig:slit):
$$ $$
I(\theta) = \frac{E^2}{2} \left( \frac{2 \pi a^2 \cos{\theta}}{L} I(\theta) = \frac{E^2}{2} \left( \frac{2 \pi a^2 \cos{\theta}}{L}
\frac{J_1(x)}{x} \right)^2 \with x = k a \sin{\theta} \frac{J_1(x)}{x} \right)^2 \with x = k a \sin{\theta}
$$ $$
where: where:
- $E$ is the electric field amplitude, default set $E = \SI{1e4}{V/m}$; - $E$ is the electric field amplitude, default $E = \SI{1e4}{V/m}$;
- $a$ is the radius of the slit aperture, default set $a = \SI{0.01}{m}$; - $a$ is the radius of the slit aperture, default $a = \SI{0.01}{m}$;
- $\theta$ is the angle specified in @fig:slit; - $\theta$ is the diffraction angle, shown in @fig:slit;
- $J_1$ is the Bessel function of first kind; - $J_1$ is a Bessel function of first kind;
- $k$ is the wavenumber, default set $k = \SI{1e-4}{m^{-1}}$; - $k$ is the wavenumber, default $k = \SI{1e-4}{m^{-1}}$;
- $L$ default set $L = \SI{1}{m}$. - $L$ s the distance from the screen, default $L = \SI{1}{m}$.
\begin{figure} \begin{figure}
\hypertarget{fig:slit}{% \hypertarget{fig:slit}{%
@ -52,9 +51,9 @@ where:
Once again, the *hit-miss* method described in @sec:3 was implemented and Once again, the *hit-miss* method described in @sec:3 was implemented and
the same procedure about the generation of $\theta$ was applied. This time, the same procedure about the generation of $\theta$ was applied. This time,
though, $\theta$ must be evenly distributed on half sphere, hence: though, $\theta$ must be uniformly distributed on the half sphere, hence:
\begin{align*} \begin{align*}
\frac{d^2 P}{d\omega^2} = const = \frac{1}{2 \pi} \frac{d^2 P}{d\omega^2} = \frac{1}{2 \pi}
&\thus d^2 P = \frac{1}{2 \pi} d\omega^2 = &\thus d^2 P = \frac{1}{2 \pi} d\omega^2 =
\frac{1}{2 \pi} d\phi \sin{\theta} d\theta \\ \frac{1}{2 \pi} d\phi \sin{\theta} d\theta \\
&\thus \frac{dP}{d\theta} = \int_0^{2 \pi} d\phi \frac{1}{2 \pi} \sin{\theta} &\thus \frac{dP}{d\theta} = \int_0^{2 \pi} d\phi \frac{1}{2 \pi} \sin{\theta}
@ -64,13 +63,13 @@ though, $\theta$ must be evenly distributed on half sphere, hence:
\begin{align*} \begin{align*}
\theta = \theta (x) &\thus \theta = \theta (x) &\thus
\frac{dP}{d\theta} = \frac{dP}{dx} \cdot \left| \frac{dx}{d\theta} \right| \frac{dP}{d\theta} = \frac{dP}{dx} \cdot \left| \frac{dx}{d\theta} \right|
= \left. \frac{dP}{dx} \middle/ \, \left| \frac{d\theta}{dx} \right| \right. = \left. \frac{dP}{dx} \middle/ \, \left| \frac{d\theta}{dx} \right| \right.
\\ \\
&\thus \sin{\theta} = \left. 1 \middle/ \, \left| &\thus \sin{\theta} = \left. 1 \middle/ \, \left|
\frac{d\theta}{dx} \right| \right. \frac{d\theta}{dx} \right| \right.
\end{align*} \end{align*}
If $\theta$ is chosen to grew together with $x$, then the absolute value can be If $\theta$ is taken to increase with $x$, then the absolute value can be
omitted: omitted:
\begin{align*} \begin{align*}
\frac{d\theta}{dx} = \frac{1}{\sin{\theta}} \frac{d\theta}{dx} = \frac{1}{\sin{\theta}}
@ -84,50 +83,59 @@ omitted:
\end{align*} \end{align*}
The so obtained sample was binned and stored in a histogram with a customizable The so obtained sample was binned and stored in a histogram with a customizable
number $n$ of bins (default set $n = 150$) ranging from $\theta = 0$ to $\theta number $n$ of bins (default to $n = 150$) ranging from $\theta = 0$ to $\theta
= \pi/2$ because of the system symmetry. In @fig:original an example is shown. = \pi/2$ because of the system symmetry. In @fig:original an example is shown.
![Example of intensity histogram.](images/6-original.pdf){#fig:original} ![Example of intensity histogram.](images/6-original.pdf){#fig:original}
## Gaussian convolution {#sec:convolution} ## Convolution {#sec:convolution}
In order to simulate the instrumentation response, the sample was then smeared In order to simulate the instrumentation response, the sample was then
with a Gaussian function with the aim to recover the original sample afterwards, convolved with a gaussian kernel with the aim to recover the original sample
implementing a deconvolution routine. afterwards, implementing a deconvolution routine.
For this purpose, a 'kernel' histogram with an even number $m$ of bins and the For this purpose, a 'kernel' histogram with an even number $m$ of bins and the
same bin width of the previous one, but a smaller number of them ($m \sim 6\% same bin width of the previous one, but a smaller number of them ($m \sim 6\%
\, n$), was generated according to a Gaussian distribution with mean $\mu = 0$ \, n$), was generated according to a gaussian distribution with mean $\mu = 0$
and variance $\sigma$. The reason why the kernel was set this way will be and variance $\sigma$. The reason why the kernel was set this way will be
discussed shortly. discussed shortly.
Then, the original histogram was convolved with the kernel in order to obtain Then, the original histogram was convolved with the kernel in order to obtain
the smeared signal. As an example, the result obtained for $\sigma = \Delta the smeared signal. As an example, the result obtained for $\sigma = \Delta
\theta$, where $\Delta \theta$ is the bin width, is shown in @fig:convolved. \theta$, where $\Delta \theta$ is the bin width, is shown in @fig:convolved.
The smeared signal looks smoother with respect to the original one: the higher The smeared signal looks smoother with respect to the original one: the higher
$\sigma$, the greater the smoothness. $\sigma$, the greater the smoothness.
![Convolved signal.](images/6-smoothed.pdf){#fig:convolved} ![Convolved signal.](images/6-smoothed.pdf){#fig:convolved}
The convolution was implemented as follow. Consider the definition of The convolution was implemented as follow. Consider the definition of
convolution of two functions $f(x)$ and $g(x)$: convolution for two integrable functions $f(x)$ and $g(x)$:
$$ $$
f * g (x) = \int \limits_{- \infty}^{+ \infty} dy f(y) g(x - y) (f * g)(x) = \int \limits_{- \infty}^{+ \infty} dy f(y) g(x - y)
$$ $$
This definition is easily recast into a form that lends itself to be
implemented for discrete arrays of numbers, such as histograms or vectors:
\begin{align*}
(f * g)(x)
&= \int \limits_{- \infty}^{+ \infty} dy f(y) (R \, g)(y-x) \\
&= \int \limits_{- \infty}^{+ \infty} dy f(y) (T_x \, R \, g)(y) \\
&= (f, T_x \, R \, g)(y)
\end{align*}
Since a histogram is made of discrete values, a discrete convolution of the where:
signal ($s$) and the kernel ($k$) must be computed. Hence, the procedure boils
down to an element wise product between $s$ and the flipped histogram of $k$
(from the last bin to the first one) for each relative position of the two
histograms. Namely, if $c_i$ is the $i^{\text{th}}$ bin of the convolved
histogram:
$$
c_i = \sum_{j = 0}^{m - 1} k_j s_{i - j}
= \sum_{j' = m - 1}^{0} k_{m - 1 - j'} s_{i - m + 1 + j'}
\with j' = m - 1 - j
$$
For a better understanding, see @fig:dot_conv: the third histogram turns out - $R$ and $T_x$ are the reflection and translation by $x$ operators
with $n + m - 1$ bins, a number greater than the original one. - $(\cdot, \cdot)$ is an inner product
Given a signal $s$ of $n$ elements and a kernel $k$ of $m$ elements,
their convolution is a vector of $n + m + 1$ elements computed
by flipping $s$ ($R$ operator) and shifting its indices ($T_i$ operator):
$$
c_i = (s, T_i \, R \, k)
$$
The shift is defined such that when index overflows ($\ge m$ or $\le$ 0) the
element is zero. This convention specifies the behavior at the edges
and results in the $m + 1$ increase in size.
For a better understanding, see @fig:dot_conv.
\begin{figure} \begin{figure}
\hypertarget{fig:dot_conv}{% \hypertarget{fig:dot_conv}{%
@ -197,31 +205,31 @@ with $n + m - 1$ bins, a number greater than the original one.
\end{figure} \end{figure}
## Unfolding with FFT ## Deconvolution by Fourier transform
Two different unfolding routines were implemented, one of which exploiting the Two different unfolding routines were implemented, one of which exploiting the
Fast Fourier Transform (FFT). Fast Fourier Transform (FFT).
This method is based on the convolution theorem, according to which, given two This method is based on the convolution theorem, which states that given two
functions $f(x)$ and $g(x)$: $L^1$ functions $f(x)$ and $g(x)$:
$$ $$
\hat{F}[f * g] = \hat{F}[f] \cdot \hat{F}[g] \mathcal{F}[f * g] = \mathcal{F}[f] \cdot \mathcal{F}[g]
$$ $$
where $\hat{F}[\quad]$ stands for the Fourier transform of its argument. where $\mathcal{F}[\cdot]$ stands for the Fourier transform.
Being the histogram a discrete set of data, the Discrete Fourier Transform (DFT) Being the histogram a discrete set of data, the Discrete Fourier Transform (DFT)
was applied. When dealing with arrays of discrete values, the theorem still was applied. When dealing with arrays of discrete values, the theorem still
holds if the two arrays have the same length and a cyclical convolution is holds if the two arrays have the same length and $(*)$ is understood
applied. For this reason the kernel was 0-padded in order to make it the same as a cyclical convolution.
length of the original signal. Besides, the 0-padding allows to avoid unpleasant For this reason, the kernel was 0-padded to make it the same length of the
side effects due to the cyclical convolution. original signal and, at the same time, avoiding the cyclical convolution.
In order to accomplish this procedure, both histograms were transformed into In order to accomplish this procedure, both histograms were transformed into
vectors. The implementation lies in the computation of the Fourier transform of vectors. The implementation lies in the computation of the Fourier transform of
the smeared signal and the kernel, the ratio between their transforms and the the smeared signal and the kernel, the ratio between their transforms and the
anti-transformation of the result: anti-transformation of the result:
$$ $$
\hat{F}[s * k] = \hat{F}[s] \cdot \hat{F}[k] \thus \mathcal{F}[s * k] = \mathcal{F}[s] \cdot \mathcal{F}[k] \thus
\hat{F} [s] = \frac{\hat{F}[s * k]}{\hat{F}[k]} \mathcal{F} [s] = \frac{\mathcal{F}[s * k]}{\mathcal{F}[k]}
$$ $$
The FFT are efficient algorithms for calculating the DFT. Given a set of $n$ The FFT are efficient algorithms for calculating the DFT. Given a set of $n$
@ -233,8 +241,8 @@ $$
where $i$ is the imaginary unit. where $i$ is the imaginary unit.
The evaluation of the DFT is a matrix-vector multiplication $W \vec{z}$. A The evaluation of the DFT is a matrix-vector multiplication $W \vec{z}$. A
general matrix-vector multiplication takes $O(n^2)$ operations. FFT algorithms, general matrix-vector multiplication takes $O(n^2)$ operations. FFT algorithms,
instead, use a *divide-and-conquer* strategy to factorize the matrix into instead, use a *divide-and-conquer* strategy to factorise the matrix into
smaller sub-matrices. If $n$ can be factorized into a product of integers $n_1$, smaller sub-matrices. If $n$ can be factorised into a product of integers $n_1$,
$n_2 \ldots n_m$, then the DFT can be computed in $O(n \sum n_i) < O(n^2)$ $n_2 \ldots n_m$, then the DFT can be computed in $O(n \sum n_i) < O(n^2)$
operations, hence the name. operations, hence the name.
The inverse Fourier transform is thereby defined as: The inverse Fourier transform is thereby defined as:
@ -243,14 +251,14 @@ $$
\sum_{k=0}^{n-1} x_k \exp \left( \frac{2 \pi i j k}{n} \right) \sum_{k=0}^{n-1} x_k \exp \left( \frac{2 \pi i j k}{n} \right)
$$ $$
In GSL, `gsl_fft_complex_forward()` and `gsl_fft_complex_inverse()` are In GSL, `gsl_fft_complex_forward()` and `gsl_fft_complex_inverse()` are
functions which allow to compute the forward and inverse transform, functions which allow to compute the forward and inverse transform,
respectively. respectively.
The inputs and outputs for the complex FFT routines are packed arrays of The inputs and outputs for the complex FFT routines are packed arrays of
floating point numbers. In a packed array, the real and imaginary parts of each floating point numbers. In a packed array, the real and imaginary parts of each
complex number are placed in alternate neighboring elements. In this special complex number are placed in alternate neighboring elements. In this special
case, the sequence of values to be transformed is made of real numbers, hence case where the sequence of values to be transformed is made of real numbers,
Fourier transform is a complex sequence which satisfies: the Fourier transform is a complex sequence which satisfies:
$$ $$
z_k = z^*_{n-k} z_k = z^*_{n-k}
$$ $$
@ -303,30 +311,29 @@ If the bin width is $\Delta \theta$, then the DFT domain ranges from $-1 / (2
aforementioned GSL functions store the positive values from the beginning of aforementioned GSL functions store the positive values from the beginning of
the array up to the middle and the negative backwards from the end of the array the array up to the middle and the negative backwards from the end of the array
(see @fig:reorder). (see @fig:reorder).
Whilst do not matters if the convolved histogram has positive or negative While the order of frequencies of the convolved histogram is immaterial,
values, the kernel must be centered in zero in order to compute a correct the kernel must be centered at zero in order to compute a correct
convolution. This requires the kernel to be made of an ever number of bins convolution. This requires the kernel to be made of an ever number of bins
in order to be possible to cut it into two same-length halves. to be divided into two equal halves.
When $\hat{F}[s * k]$ and $\hat{F}[k]$ are computed, they are given in the When $\mathcal{F}[s * k]$ and $\mathcal{F}[k]$ are computed, they are given in the
half-complex GSL format and their normal format must be restored in order to half-complex GSL packed format, so they must be unpacked to a complex
use them as standard complex numbers and compute the ratio between them. Then, GSL vector before performing the element-wise division. Then,
the result must return in the half-complex format for the inverse DFT the result is repacked to the half-complex format for the inverse DFT
computation. GSL provides the function `gsl_fft_halfcomplex_unpack()` which computation. GSL provides the function `gsl_fft_halfcomplex_unpack()` which
convert the vectors from half-complex format to standard complex format but the convert the vectors from half-complex format to standard complex format but the
inverse procedure is not provided by GSL and was hence implemented in the inverse procedure is not provided by GSL and had to be implemented.
code.
At the end, the external bins which exceed the original signal size are cut In the end, the external bins which exceed the original signal size are cut
away in order to restore the original number of bins $n$. Results will be away in order to restore the original number of bins $n$. Results will be
discussed in @sec:conv_results. discussed in @sec:conv-results.
## Unfolding with Richardson-Lucy ## Richardson-Lucy deconvolution
The RichardsonLucy (RL) deconvolution is an iterative procedure typically used The RichardsonLucy (RL) deconvolution is an iterative procedure typically used
for recovering an image that has been blurred by a known 'point spread for recovering an image that has been blurred by a known 'point spread
function'. function', or 'kernel'.
Consider the problem of estimating the frequency distribution $f(\xi)$ of a Consider the problem of estimating the frequency distribution $f(\xi)$ of a
variable $\xi$ when the available measure is a sample {$x_i$} of points variable $\xi$ when the available measure is a sample {$x_i$} of points
@ -337,15 +344,15 @@ $$ {#eq:conv}
where $P(x | \xi) \, d\xi$ is the probability (presumed known) that $x$ falls where $P(x | \xi) \, d\xi$ is the probability (presumed known) that $x$ falls
in the interval $(x, x + dx)$ when $\xi = \xi$. If the so-called point spread in the interval $(x, x + dx)$ when $\xi = \xi$. If the so-called point spread
function $P(x | \xi)$ follows a normal distribution with variance $\sigma$, function $P(x | \xi)$ is a function of $x-\xi$ only, for example a normal
namely: distribution with variance $\sigma$,
$$ $$
P(x | \xi) = \frac{1}{\sqrt{2 \pi} \sigma} P(x | \xi) = \frac{1}{\sqrt{2 \pi} \sigma}
\exp \left( - \frac{(x - \xi)^2}{2 \sigma^2} \right) \exp \left( - \frac{(x - \xi)^2}{2 \sigma^2} \right)
$$ $$
then, @eq:conv becomes a convolution and finding $f(\xi)$ turns out to be a then, @eq:conv becomes a convolution and finding $f(\xi)$ amounts
deconvolution. to a deconvolution.
An example of this problem is precisely that of correcting an observed An example of this problem is precisely that of correcting an observed
distribution $\phi(x)$ for the effect of observational errors, which are distribution $\phi(x)$ for the effect of observational errors, which are
represented by the function $P (x | \xi)$. represented by the function $P (x | \xi)$.
@ -376,16 +383,15 @@ Since $Q (\xi | x)$ depends on $f(\xi)$, @eq:second suggests an iterative
procedure for generating estimates of $f(\xi)$. With a guess for $f(\xi)$ and procedure for generating estimates of $f(\xi)$. With a guess for $f(\xi)$ and
a known $P(x | \xi)$, @eq:first can be used to calculate and estimate for a known $P(x | \xi)$, @eq:first can be used to calculate and estimate for
$Q (\xi | x)$. Then, taking the hint provided by @eq:second, an improved $Q (\xi | x)$. Then, taking the hint provided by @eq:second, an improved
estimate for $f (\xi)$ can be generated, using the observed sample {$x_i$} to estimate for $f(\xi)$ can be generated, using the observed sample {$x_i$} to
give an approximation for $\phi$. give an approximation for $\phi$.
Thus, if $f^t$ is the $t^{\text{th}}$ estimate, the $t^{\text{th + 1}}$ is: Thus, if $f^t$ is the $t-th$ estimate, the next is given by:
$$ $$
f^{t + 1}(\xi) = \int dx \, \phi(x) Q^t(\xi | x) f^{t + 1}(\xi) = \int dx \, \phi(x) Q^t(\xi | x)
\with \with
Q^t(\xi | x) = \frac{f^t(\xi) \cdot P(x | \xi)} Q^t(\xi | x) = \frac{f^t(\xi) \cdot P(x | \xi)}
{\int d\xi \, f^t(\xi) P(x | \xi)} {\int d\xi \, f^t(\xi) P(x | \xi)}
$$ $$
from which: from which:
$$ $$
f^{t + 1}(\xi) = f^t(\xi) f^{t + 1}(\xi) = f^t(\xi)
@ -393,18 +399,17 @@ $$
P(x | \xi) P(x | \xi)
$$ {#eq:solution} $$ {#eq:solution}
When the spread function $P(x | \xi)$ is Gaussian, @eq:solution can be When the spread function $P(x | \xi) = P(x-\xi)$, @eq:solution can be
rewritten in terms of convolutions: rewritten in terms of convolutions:
$$ $$
f^{t + 1} = f^{t}\left( \frac{\phi}{{f^{t}} * P} * P^{\star} \right) f^{t + 1} = f^{t}\left( \frac{\phi}{{f^{t}} * P} * P^{\star} \right)
$$ $$
where $P^{\star}$ is the flipped point spread function [@lucy74]. where $P^{\star}$ is the flipped point spread function [@lucy74].
In this special case, the Gaussian kernel which was convolved with the original In this particular instance, a gaussian kernel was convolved with the original
histogram stands for the point spread function. Dealing with discrete values, histogram. Again, dealing with discrete arrays of numbers, the division and
the division and multiplication are element wise and the convolution is to be multiplication are element wise and the convolution is to be carried out as
carried out as described in @sec:convolution. described in @sec:convolution.
When implemented, this method results in an easy step-wise routine: When implemented, this method results in an easy step-wise routine:
- choose a zero-order estimate for {$f(\xi)$}; - choose a zero-order estimate for {$f(\xi)$};
@ -412,9 +417,9 @@ When implemented, this method results in an easy step-wise routine:
- compute the convolutions, the product and the division at each step; - compute the convolutions, the product and the division at each step;
- proceed until a given number $r$ of iterations is reached. - proceed until a given number $r$ of iterations is reached.
In this case, the zero-order was set $f(\xi) = 0.5 \, \forall \, \xi$. Different In this case, the zero-order was set $f(\xi) = 0.5 \, \forall \, \xi$ and
number of iterations where tested. Results are discussed in different number of iterations where tested. Results are discussed in
@sec:conv_results. @sec:conv-results.
## The earth mover's distance ## The earth mover's distance
@ -424,16 +429,15 @@ deconvolved outcome with the original signal was quantified using the earth
mover's distance. mover's distance.
In statistics, the earth mover's distance (EMD) is the measure of distance In statistics, the earth mover's distance (EMD) is the measure of distance
between two distributions [@cock41]. Informally, if one imagines the two between two distributions [@cock41]. Informally, if one imagines the
distributions as two piles of different amount of dirt in their respective distributions as two piles of different amount of dirt in their respective
regions, the EMD is the minimum cost of turning one pile into the other, regions, the EMD is the minimum cost of turning one pile into the other, making
making the first one the most possible similar to the second one, where the the first the most possible similar to the second, where the cost is the amount
cost is the amount of dirt moved times the distance by which it is moved. of dirt moved times the distance by which it is moved.
Computing the EMD is based on a solution to the transportation problem, which
can be formalized as follows.
Consider two vectors $P$ and $Q$ which represent the two distributions whose Computing the EMD is based on the solution to a transportation problem, which
EMD has to be measured: can be formalized as follows. Consider two vectors $P$ and $Q$ which represent
the two distributions whose EMD has to be measured:
$$ $$
P = \{ (p_1, w_{p1}) \dots (p_m, w_{pm}) \} \et P = \{ (p_1, w_{p1}) \dots (p_m, w_{pm}) \} \et
@ -442,18 +446,18 @@ $$
where $p_i$ and $q_i$ are the 'values' (that is, the location of the dirt) and where $p_i$ and $q_i$ are the 'values' (that is, the location of the dirt) and
$w_{pi}$ and $w_{qi}$ are the 'weights' (that is, the quantity of dirt). A $w_{pi}$ and $w_{qi}$ are the 'weights' (that is, the quantity of dirt). A
ground distance matrix $D_{ij}$ is defined such as its entries $d_{ij}$ are the ground distance matrix $D$ is defined such as its entries $d_{ij}$ are the
distances between $p_i$ and $q_j$. The aim is to find the flow matrix $F_{ij}$, distances between $p_i$ and $q_j$. The aim is to find the flow matrix $F$,
where each entry $f_{ij}$ is the flow from $p_i$ to $q_j$ (which would be where each entry $f_{ij}$ is the flow from $p_i$ to $q_j$ (which would be the
the quantity of moved dirt), which minimizes the cost $W$: quantity of moved dirt), which minimizes the cost $W$:
$$ $$
W (P, Q, F) = \sum_{i = 1}^m \sum_{j = 1}^n f_{ij} d_{ij} W (P, Q, F) = \sum_{i = 1}^m \sum_{j = 1}^n f_{ij} d_{ij}
$$ $$
The $Q$ region is to be considered empty at the beginning: the 'dirt' present in The $Q$ region is to be considered empty at the beginning: the 'dirt' present
$P$ must be moved to $Q$ in order to reach the same distribution as close as in $P$ must be moved to $Q$ in order to reproduce the same distribution as
possible. Namely, the following constraints must be satisfied: close as possible. Formally, the following constraints must be satisfied:
\begin{align*} \begin{align*}
&\text{1.} \hspace{20pt} f_{ij} \ge 0 \hspace{15pt} &\text{1.} \hspace{20pt} f_{ij} \ge 0 \hspace{15pt}
@ -480,8 +484,7 @@ same amount of dirt, hence all the dirt present in $P$ is necessarily moved to
$Q$ and the flow equals the total amount of available dirt. $Q$ and the flow equals the total amount of available dirt.
Once the transportation problem is solved and the optimal flow is found, the Once the transportation problem is solved and the optimal flow is found, the
EMD is defined as the work normalized by the total flow: EMD is defined as the work normalized by the total flow:
$$ $$
\text{EMD} (P, Q) = \frac{\sum_{i = 1}^m \sum_{j = 1}^n f_{ij} d_{ij}} \text{EMD} (P, Q) = \frac{\sum_{i = 1}^m \sum_{j = 1}^n f_{ij} d_{ij}}
{\sum_{i = 1}^m \sum_{j=1}^n f_{ij}} {\sum_{i = 1}^m \sum_{j=1}^n f_{ij}}
@ -496,52 +499,52 @@ $$
$$ $$
where the sum runs over the entries of the vectors $U$ and $V$, which are the where the sum runs over the entries of the vectors $U$ and $V$, which are the
cumulative vectors of the histograms. In the code, the following equivalent cumulative sums of the histograms. In the code, the following equivalent
iterative routine was implemented. iterative routine was implemented.
$$ $$
\text{EMD} (u, v) = \sum_i |\text{EMD}_i| \with \text{EMD} (u, v) = \sum_i |\text{d}_i| \with
\begin{cases} \begin{cases}
\text{EMD}_i = v_i - u_i + \text{EMD}_{i-1} \\ \text{d}_i = v_i - u_i + \text{d}_{i-1} \\
\text{EMD}_0 = 0 \text{d}_0 = 0
\end{cases} \end{cases}
$$ $$
In fact: The equivalence is apparent once the definition is expanded:
\begin{align*} \begin{align*}
\text{EMD} (u, v) &= \sum_i |\text{EMD}_i| = |\text{EMD}_0| + |\text{EMD}_1| \text{EMD} (u, v)
+ |\text{EMD}_2| + |\text{EMD}_3| + \dots \\ &= \sum_i |\text{d}_i| = |\text{d}_0| + |\text{d}_1|
&= 0 + |v_1 - u_1 + \text{EMD}_0| + + |\text{d}_2| + |\text{d}_3| + \dots \\
|v_2 - u_2 + \text{EMD}_1| + &= 0 + |v_1 - u_1 + \text{d}_0| +
|v_3 - u_3 + \text{EMD}_2| + \dots \\ |v_2 - u_2 + \text{d}_1| +
&= |v_1 - u_1| + |v_3 - u_3 + \text{d}_2| + \dots \\
|v_1 - u_1 + v_2 - u_2| + &= |v_1 - u_1| +
|v_1 - u_1 + v_2 - u_2 + v_3 - u_3| + \dots \\ |v_1 - u_1 + v_2 - u_2| +
&= |v_1 - u_i| + |v_1 - u_1 + v_2 - u_2 + v_3 - u_3| + \dots \\
|v_1 + v_2 - (u_1 + u_2)| + &= |v_1 - u_1| +
|v_1 + v_2 + v_3 - (u_1 + u_2 + u_3))| + \dots \\ |v_1 + v_2 - (u_1 + u_2)| +
&= |V_1 - U_1| + |V_2 - U_2| + |V_3 - U_3| + \dots \\ |v_1 + v_2 + v_3 - (u_1 + u_2 + u_3))| + \dots \\
&= \sum_i |U_i - V_i| &= |V_1 - U_1| + |V_2 - U_2| + |V_3 - U_3| + \dots \\
&= \sum_i |U_i - V_i|
\end{align*} \end{align*}
This simple formula enabled comparisons to be made between a great number of This simple algorithm enabled the comparisons between a great number of
results. histogram to be computed efficiently.
In order to make the code more flexible, the data were normalized before In order to make the code more flexible, the data were normalized before
computing the EMD: in doing so, it is possible to compare even samples with a computing the EMD: in doing so, it is possible to compare even samples with a
different number of points. different number of points.
## Results comparison {#sec:conv_results} ## Results comparison {#sec:conv-results}
### Noiseless results {#sec:noiseless} ### Noiseless results {#sec:noiseless}
Along with the analysis of the results obtained varying the convolved Gaussian In addition to the convolution with a gaussian kernel of width $\sigma$, the
width $\sigma$, the possibility to add a Gaussian noise to the convolved possibility to add a gaussian noise to the convolved histogram counts was also
histogram was also implemented to check weather the deconvolution is affected implemented to check weather the deconvolution is affected or not by this kind
or not by this kind of interference. This approach is described in the next of interference. This approach is described in the next subsection, while the
subsection, while the noiseless results are given in this one. noiseless results are given in this one.
The two methods were compared for three different values of $\sigma$: The two methods were compared for three different values of $\sigma$:
$$ $$
@ -551,28 +554,31 @@ $$
$$ $$
Since the RL method depends on the number $r$ of performed rounds, in order to Since the RL method depends on the number $r$ of performed rounds, in order to
find out how many of them it was sufficient or necessary to compute, the earth find out how many are sufficient or necessary to compute, the earth mover's
mover's distance between the deconvolved signal and the original one was distance between the deconvolved signal and the original one was measured for
measured for different $r$s for each of the three tested values of $\sigma$. different $r$s for each of the three tested values of the kernel $\sigma$.
To achieve this goal, a number of 1000 experiments (default and customizable
value) were simulated and, for each of them, the original signal was convolved
with the kernel, the appropriate $\sigma$ value set, and then deconvolved with
the RL algorithm with a given $r$ and the EMD was measured. Then, an average of
the so-obtained EMDs was computed together with the standard deviation. This
procedure was repeated for a few tens of different $r$s till a flattening or a
minimum of the curve became evident. Results in @fig:rounds-noiseless.
The plots in @fig:rless-0.1 show the average (red) and standard deviation (grey) To achieve this goal, a number of 1000 experiments were simulated. Each
of the measured EMD for $\sigma = 0.1 \, \Delta \theta$. The number of consists in generating the diffraction signal, convolving it with the a kernel
of width $\sigma$, deconvolving with the RL algorithm with a given number of
rounds $r$ and measuring the EMD.
The distances are used to build an histogram of EMD distribution, from which
mean and standard deviation are computed.
This procedure was repeated for a few tens of different $r$s till a flattening
or a minimum of the curve became evident. All the results are shown in
@fig:rounds-noiseless.
The plots in @fig:rless-0.1 show the average (red) and standard deviation
(grey) of the measured EMD for $\sigma = 0.1 \, \Delta \theta$. The number of
iterations does not affect the quality of the outcome (those fluctuations are iterations does not affect the quality of the outcome (those fluctuations are
merely a fact of floating-points precision) and the best result is obtained merely a fact of floating-points precision) and the best result is obtained for
for $r = 2$, meaning that the convergence of the RL algorithm is really fast and $r = 2$, meaning that the convergence of the RL algorithm is really fast and
this is due to the fact that the histogram was modified pretty poorly. In this is due to the fact that the histogram was only slighlty modified.
@fig:rless-0.5, the curve starts to flatten at about 10 rounds, whereas in In @fig:rless-0.5, the curve starts to flatten at about 10 rounds, whereas in
@fig:rless-1 a minimum occurs around \num{5e3} rounds, meaning that, whit such a @fig:rless-1 a minimum occurs around \num{5e3} rounds, meaning that, whit such
large kernel, the convergence is very slow, even if the best results are close a large kernel, the convergence is very slow, even if the best results are
to the one found for $\sigma = 0.5$. close to the one found for $\sigma = 0.5$.
The following $r$s were chosen as the most fitted: The following $r$s were chosen as the most fitting:
\begin{align*} \begin{align*}
\sigma = 0.1 \, \Delta \theta &\thus n^{\text{best}} = 2 \\ \sigma = 0.1 \, \Delta \theta &\thus n^{\text{best}} = 2 \\
\sigma = 0.5 \, \Delta \theta &\thus n^{\text{best}} = 10 \\ \sigma = 0.5 \, \Delta \theta &\thus n^{\text{best}} = 10 \\
@ -590,13 +596,14 @@ fact, the FFT is the analytical result of the deconvolution.
<div id="fig:rounds-noiseless"> <div id="fig:rounds-noiseless">
![](images/6-nonoise-rounds-0.1.pdf){#fig:rless-0.1} ![](images/6-nonoise-rounds-0.1.pdf){#fig:rless-0.1}
![](images/6-nonoise-rounds-0.5.pdf){#fig:rless-0.5} ![](images/6-nonoise-rounds-0.5.pdf){#fig:rless-0.5}
![](images/6-nonoise-rounds-1.pdf){#fig:rless-1} ![](images/6-nonoise-rounds-1.pdf){#fig:rless-1}
EMD as a function of RL rounds for different kernel $\sigma$ values. The EMD as a function of RL rounds for different kernel $\sigma$ values. The
average is shown in red and the standard deviation in grey. Noiseless results. average is shown in red and the standard deviation in grey. Noiseless results
shown.
</div> </div>
<div id="fig:emd-noiseless"> <div id="fig:emd-noiseless">
@ -618,39 +625,40 @@ As described above, for a given $r$, a thousands of experiments were simulated:
for each of this simulations, an EMD was computed. Besides computing their for each of this simulations, an EMD was computed. Besides computing their
average and standard deviations, those values were used to build histograms average and standard deviations, those values were used to build histograms
showing the EMD distribution. showing the EMD distribution.
Once the most fitted numbers of rounds $r^{\text{best}}$ were found, their Once the best numbers of rounds $r^{\text{best}}$ were found, their histograms
histograms were compared to the histograms of the FFT results, started from the were compared to the histograms of the FFT results, started from the same
same convolved signals, and the EDM of the convolved signals themselves, in convolved signals, and the EMD of the convolved signals themselves, in order to
order to check if an improvement was truly achieved. Results are shown in check if an improvement was truly achieved. Results are shown in
@fig:emd-noiseless. @fig:emd-noiseless.
As expected, the FFT results are always of the same order of magnitude, As expected, the FFT results are always of the same order of magnitude,
\num{1e-15}, independently from the kernel width, whereas the RL deconvolution \num{1e-15}, independently from the kernel width, whereas the RL deconvolution
results change a lot, ranging from \num{1e-16} for $\sigma = 0.1 \, \Delta is greatly affected by it, ranging from \num{1e-16} for $\sigma = 0.1 \, \Delta
\theta$ to \num{1e-4} for $\sigma = \Delta \theta$. \theta$ to \num{1e-4} for $\sigma = \Delta \theta$.
The first result was quite unexpected: for very low values of $\sigma$, the RL On the other hand, the first result came off as a surprise: for very low values
routine gives better results with respect to the FFT. This is because of the of $\sigma$, the RL routine gives better results. Apparently the RL algorithm
math which lies beneath the two methods: apparently, the RL technique is less is more numerically stable than a FFT, which is more affected by floating
subject to round-off errors. points round-off errors.
Then, as regards the comparison with the convolved signal, which shows always
Regarding the comparison with the convolved signal, which shows always
an EMD of \num{1e-2}, both RL and FFT always return results closer to the an EMD of \num{1e-2}, both RL and FFT always return results closer to the
original signal, meaning that the deconvolution is working. original signal, meaning that the deconvolution is indeed working.
### Noisy results ### Noisy results
In order to observe the effect of the Gaussian noise upon these two In order to observe the effect of the gaussian noise on the two deconvolution
deconvolution methods, a certain value of $\sigma$ ($\sigma = 0.8 \, \Delta methods, a value of $\sigma = 0.8 \, \Delta \theta$ for the kernel width was
\theta$) was arbitrary chosen. The noise was then applied to the convolved arbitrary chosen. The noise was then applied to the convolved histogram as
histogram as follow. follows.
![Example of Noisy histogram, ![Example of Noisy histogram,
$\sigma_N = 0.05$.](images/6-noisy.pdf){#fig:noisy} $\sigma_N = 0.05$.](images/6-noisy.pdf){#fig:noisy}
For each bin, once the convolved histogram was computed, a value $v_N$ was For each bin, once the convolved histogram was computed, a value $v_N$ was
randomly sampled from a Gaussian distribution with standard deviation randomly sampled from a gaussian distribution with standard deviation
$\sigma_N$, and the value $v_n \cdot b$ was added to the bin itself, where $b$ $\sigma_N$, and the value $v_n \cdot b$ was added to the bin itself, where $b$
is the content of the bin. An example with $\sigma_N = 0.05$ of the new is the count of the bin. An example with $\sigma_N = 0.05$ of the new
histogram is shown in @fig:noisy. histogram is shown in @fig:noisy.
The following three values of $\sigma_N$ were tested: The following three values of $\sigma_N$ were tested:
$$ $$
@ -665,19 +673,19 @@ shown, this time varying $\sigma_N$ and keeping $\sigma = 0.8 \, \Delta \theta$
constant. constant.
In @fig:rnoise-0.005, the flattening is achieved around $r = 20$ and in In @fig:rnoise-0.005, the flattening is achieved around $r = 20$ and in
@fig:rnoise-0.01 it is sufficient $\sim r = 15$. When the noise becomes too @fig:rnoise-0.01 it is sufficient $\sim r = 15$. When the noise becomes too
high, on the other hand, the more $r$ increases, the more the algorithm becomes high, on the other hand, as $r$ grows, the algorithm becomes
inefficient, requiring a very small number of iterations. largely ineffective, so only a few iterations are actually needed.
Hence, the most fitting values were chosen as: The most fitting values were chosen as:
\begin{align*} \begin{align*}
\sigma_N = 0.005 &\thus r^{\text{best}} = 20 \\ \sigma_N = 0.005 &\thus r^{\text{best}} = 20 \\
\sigma_N = 0.01 &\thus r^{\text{best}} = 15 \\ \sigma_N = 0.01 &\thus r^{\text{best}} = 15 \\
\sigma_N = 0.05 &\thus r^{\text{best}} = 1 \sigma_N = 0.05 &\thus r^{\text{best}} = 1
\end{align*} \end{align*}
As regards the order of magnitude, as expected, the more $\sigma_n$ increases, About the distance, as $\sigma_n$ increases, unsurprisingly the EMD grows
the more the EMD rises, ranging from $\sim$ \num{2e-4} in @fig:rnoise-0.005 larger, ranging from $\sim$ \num{2e-4} in @fig:rnoise-0.005 to $\sim$
to $\sim$ \num{1.5e-3} in @fig:rnoise-0.05. \num{1.5e-3} in @fig:rnoise-0.05.
Since the FFT is no more able to return the original signal as close as before, Since the FFT is no more able to give the original signal as close as before,
it is no more assumed to be a reference point. In fact, as shown in @fig:noisy, it is no more assumed to be a reference point. In fact, as shown in @fig:noisy,
the FFT algorithm, when dealing with noisy signals whose noise shape is the FFT algorithm, when dealing with noisy signals whose noise shape is
unknown, is as efficient as the RL, since they share the same order of magnitude unknown, is as efficient as the RL, since they share the same order of magnitude
@ -693,11 +701,10 @@ function, the FFT proved to be the best deconvolution method, restoring
perfectly the original signal to the extent permitted by the floating point perfectly the original signal to the extent permitted by the floating point
precision. When the point spread function is particularly small, this precision precision. When the point spread function is particularly small, this precision
problem is partially solved by the RL deconvolution process, which employs a problem is partially solved by the RL deconvolution process, which employs a
more stable math. more stable computations.
However, in the real world signals are inevitably blurred by unknown-shpaed However, in real world applications the measures are affected by (possibly
noise. When the kernel is not known a-priori, either of them turns out to be as unknown) noise and the signal can only be partially reconstructed by either
good as the FFT in the aforementioned situation: only a poor approximation of method.
the original signal can be derived.
<div id="fig:rounds-noise"> <div id="fig:rounds-noise">
![](images/6-noise-rounds-0.005.pdf){#fig:rnoise-0.005} ![](images/6-noise-rounds-0.005.pdf){#fig:rnoise-0.005}
@ -723,4 +730,3 @@ show the results for the FFT deconvolution, the central column the results for
the RL deconvolution and the third one shows the EMD for the convolved signal. the RL deconvolution and the third one shows the EMD for the convolved signal.
Noisy results. Noisy results.
</div> </div>