# Exercise 6

## Generating points according to Fraunhöfer diffraction

The diffraction of a plane wave through a round slit is to be simulated by
generating $N =$ 50'000 points according to the intensity distribution
$I(\theta)$ [@hecht02] on a screen at a great distance $L$ from the slit itself
(see @fig:slit):
$$
  I(\theta) = \frac{E^2}{2} \left( \frac{2 \pi a^2 \cos{\theta}}{L}
  \frac{J_1(x)}{x} \right)^2 \with x = k a \sin{\theta}
$$

where:

- $E$ is the electric field amplitude, default set $E = \SI{1e4}{V/m}$;
- $a$ is the radius of the slit aperture, default set $a = \SI{0.01}{m}$;
- $\theta$ is the angle specified in @fig:slit;
- $J_1$ is the Bessel function of first kind;
- $k$ is the wavenumber, default set $k = \SI{1e-4}{m^{-1}}$;
- $L$ default set $L = \SI{1}{m}$.

\begin{figure}
\hypertarget{fig:slit}{%
\centering
\begin{tikzpicture}
  \definecolor{cyclamen}{RGB}{146, 24, 43}
  % Walls
  \draw [thick] (-1,3)  -- (1,3)  -- (1,0.3)  -- (1.2,0.3)  -- (1.2,3)
                       -- (9,3);
  \draw [thick] (-1,-3) -- (1,-3) -- (1,-0.3) -- (1.2,-0.3) -- (1.2,-3)
                       -- (9,-3);
  \draw [thick] (10,3) -- (9.8,3) -- (9.8,-3) -- (10,-3);
  % Lines
  \draw [thick, gray] (0.7,0.3)  -- (0.5,0.3);
  \draw [thick, gray] (0.7,-0.3) -- (0.5,-0.3);
  \draw [thick, gray] (0.6,0.3)  -- (0.6,-0.3);
  \draw [thick, gray] (1.2,0)    -- (9.8,0);
  \draw [thick, gray] (1.2,-0.1)  -- (1.2,0.1);
  \draw [thick, gray] (9.8,-0.1)  -- (9.8,0.1);
  \draw [thick, cyclamen] (1.2,0)  -- (9.8,-2);
  \draw [thick, cyclamen] (7,0) to [out=-90, in=50] (6.6,-1.23);
  % Nodes
  \node at (0,0) {$2a$};
  \node at (5.5,0.4) {$L$};
  \node [cyclamen] at (5.5,-0.4) {$\theta$};
  \node [rotate=-90] at (10.2,0) {screen};
\end{tikzpicture}
\caption{Fraunhöfer diffraction.}\label{fig:slit}
}
\end{figure}

Once again, the *try and catch* method described in @sec:3 was implemented and
the same procedure about the generation of $\theta$ was applied. This time,
though, $\theta$ must be evenly distributed on half sphere, hence:
\begin{align*}
  \frac{d^2 P}{d\omega^2} = const = \frac{1}{2 \pi}
  &\thus d^2 P = \frac{1}{2 \pi} d\omega^2 =
  \frac{1}{2 \pi} d\phi \sin{\theta} d\theta \\
  &\thus \frac{dP}{d\theta} = \int_0^{2 \pi} d\phi \frac{1}{2 \pi} \sin{\theta}
  = \frac{1}{2 \pi} \sin{\theta} \, 2 \pi = \sin{\theta}
\end{align*}

\begin{align*}
  \theta = \theta (x) &\thus
  \frac{dP}{d\theta} = \frac{dP}{dx} \cdot \left| \frac{dx}{d\theta} \right|
  = \left. \frac{dP}{dx} \middle/ \, \left| \frac{d\theta}{dx} \right| \right. 
  \\
  &\thus \sin{\theta} = \left. 1 \middle/ \, \left|
  \frac{d\theta}{dx} \right| \right. 
\end{align*}

If $\theta$ is chosen to grew together with $x$, then the absolute value can be
omitted:
\begin{align*}
  \frac{d\theta}{dx} = \frac{1}{\sin{\theta}}
  &\thus d\theta \sin(\theta) = dx
  \\
  &\thus - \cos (\theta') |_{0}^{\theta} = x(\theta) - x(0) = x - 0 = x
  \\
  &\thus - \cos(\theta) + 1 =x
  \\
  &\thus \theta = \text{acos} (1 -x)
\end{align*}

The so obtained sample was binned and stored in a histogram with a customizable
number $n$ of bins (default set $n = 150$) ranging from $\theta = 0$ to $\theta
= \pi/2$ bacause of the system symmetry. In @fig:original an example is shown.

![Example of intensity histogram.](images/6-original.pdf){#fig:original}


## Gaussian convolution {#sec:convolution}

In order to simulate the instrumentation response, the sample was then to be
smeared with a Gaussian function with the aim to recover the original sample
afterwards, implementing a deconvolution routine.  
For this purpose, a 'kernel' histogram with an even number $m$ of bins and the
same bin width of the previous one, but a smaller number of them ($m \sim 6\%
\, n$), was created according to a Gaussian distribution with mean $\mu$,
and variance $\sigma$. The reason why the kernel was set this way will be
descussed lately.  
Then, the original histogram was convolved with the kernel in order to obtain
the smeared signal. As an example, the result obtained for $\sigma = \Delta
\theta$, where $\Delta \theta$ is the bin width, is shown in @fig:convolved.
The smeared signal looks smoother with respect to the original one: the higher
$\sigma$, the greater the smoothness. 

![Convolved signal.](images/6-smoothed.pdf){#fig:convolved}

The convolution was implemented as follow. Consider the definition of
convolution of two functions $f(x)$ and $g(x)$:
$$
  f * g (x) = \int \limits_{- \infty}^{+ \infty} dy f(y) g(x - y)
$$

Since a histogram is made of discrete values, a discrete convolution of the
signal ($s$) and the kernel ($k$) must be computed. Hence, the procedure boils
down to an element wise product between $s$ and the flipped histogram of $k$
(from the last bin to the first one) for each relative position of the two
histograms. Namely, if $c_i$ is the $i^{\text{th}}$ bin of the convolved
histogram:
$$
  c_i = \sum_j k_j s_{i - j} = \sum_{j'} k_{m - j'} s_{i - m + j'}
  \with j' = m - j
$$

where $j$ runs over the bins of the kernel.  
For a better understanding, see @fig:dot_conv: the third histogram turns out
with $n + m - 1$ bins, a number greater than the original one.

\begin{figure}
\hypertarget{fig:dot_conv}{%
\centering
\begin{tikzpicture}
  \definecolor{cyclamen}{RGB}{146, 24, 43}
  % original histogram
  \draw [thick, cyclamen, fill=cyclamen!05!white] (0.0,0) rectangle (0.5,2.5);
  \draw [thick, cyclamen, fill=cyclamen!05!white] (0.5,0) rectangle (1.0,2.8);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (1.0,0) rectangle (1.5,2.3);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (1.5,0) rectangle (2.0,1.8);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (2.0,0) rectangle (2.5,1.4);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (2.5,0) rectangle (3.0,1.0);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (3.0,0) rectangle (3.5,1.0);
  \draw [thick, cyclamen, fill=cyclamen!05!white] (3.5,0) rectangle (4.0,0.6);
  \draw [thick, cyclamen, fill=cyclamen!05!white] (4.0,0) rectangle (4.5,0.4);
  \draw [thick, cyclamen, fill=cyclamen!05!white] (4.5,0) rectangle (5.0,0.2);
  \draw [thick, cyclamen, fill=cyclamen!05!white] (5.0,0) rectangle (5.5,0.2);
  \draw [thick, cyclamen] (6.0,0) -- (6.0,0.2);
  \draw [thick, cyclamen] (6.5,0) -- (6.5,0.2);
  \draw [thick, <->] (0,3.3) -- (0,0) -- (7,0);
  % kernel histogram
  \draw [thick, cyclamen, fill=cyclamen!25!white] (1.0,-1) rectangle (1.5,-1.2);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (1.5,-1) rectangle (2.0,-1.6);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (2.0,-1) rectangle (2.5,-1.8);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (2.5,-1) rectangle (3.0,-1.6);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (3.0,-1) rectangle (3.5,-1.2);
  \draw [thick, <->] (1,-2) -- (1,-1) -- (4,-1);
  % arrows
  \draw [thick, cyclamen, <->] (1.25,-0.2) -- (1.25,-0.8);
  \draw [thick, cyclamen, <->] (1.75,-0.2) -- (1.75,-0.8);
  \draw [thick, cyclamen, <->] (2.25,-0.2) -- (2.25,-0.8);
  \draw [thick, cyclamen, <->] (2.75,-0.2) -- (2.75,-0.8);
  \draw [thick, cyclamen, <->] (3.25,-0.2) -- (3.25,-0.8);
  \draw [thick, cyclamen,  ->] (2.25,-2.0) -- (2.25,-4.2);
  % smeared histogram
  \begin{scope}[shift={(0,-1)}]
  \draw [thick, cyclamen, fill=cyclamen!05!white] (-1.0,-4.5) rectangle (-0.5,-4.3);
  \draw [thick, cyclamen, fill=cyclamen!05!white] (-0.5,-4.5) rectangle ( 0.0,-4.2);
  \draw [thick, cyclamen, fill=cyclamen!05!white] ( 0.0,-4.5) rectangle ( 0.5,-2.0);
  \draw [thick, cyclamen, fill=cyclamen!05!white] ( 0.5,-4.5) rectangle ( 1.0,-1.6);
  \draw [thick, cyclamen, fill=cyclamen!05!white] ( 1.0,-4.5) rectangle ( 1.5,-2.3);
  \draw [thick, cyclamen, fill=cyclamen!05!white] ( 1.5,-4.5) rectangle ( 2.0,-2.9);
  \draw [thick, cyclamen, fill=cyclamen!25!white] ( 2.0,-4.5) rectangle ( 2.5,-3.4);
  \draw [thick, cyclamen] (3.0,-4.5) -- (3.0,-4.3);
  \draw [thick, cyclamen] (3.5,-4.5) -- (3.5,-4.3);
  \draw [thick, cyclamen] (4.0,-4.5) -- (4.0,-4.3);
  \draw [thick, cyclamen] (4.5,-4.5) -- (4.5,-4.3);
  \draw [thick, cyclamen] (5.0,-4.5) -- (5.0,-4.3);
  \draw [thick, cyclamen] (5.5,-4.5) -- (5.5,-4.3);
  \draw [thick, cyclamen] (6.0,-4.5) -- (6.0,-4.3);
  \draw [thick, cyclamen] (6.5,-4.5) -- (6.5,-4.3);
  \draw [thick, cyclamen] (7.0,-4.5) -- (7.0,-4.3);
  \draw [thick, cyclamen] (7.5,-4.5) -- (7.5,-4.3);
  \draw [thick, <->] (-1,-2.5) -- (-1,-4.5) -- (8,-4.5);
  \end{scope}
  % nodes
  \node [above] at (2.25,-5.5) {$c_i$};
  \node [above] at (3.25,0)    {$s_i$};
  \node [above] at (1.95,0)    {$s_{i-3}$};
  \node [below] at (1.75,-1)   {$k_3$};
\end{tikzpicture}
\caption{Element wise product as a step of the convolution between the original signal
         (above) and the kernel (center). The final result is the lower
         fledging histogram.}\label{fig:dot_conv}
}
\end{figure}


## Unfolding with FFT

Two different unfolding routines were implemented, one of which exploiting the
Fast Fourier Transform (FFT).

This method is based on the convolution theorem, according to which, given two
functions $f(x)$ and $g(x)$:
$$
  \hat{F}[f * g] = \hat{F}[f] \cdot \hat{F}[g]
$$

where $\hat{F}[\quad]$ stands for the Fourier transform of its argument.  
Being the histogram a discrete set of data, the Discrete Fourier Transform (DFT)
was appied. When dealing with arrays of discrete values, the theorems still
holds if the two arrays have the same lenght and a cyclical convolution is
applied. For this reason the kernel was 0-padded in order to make it the same
lenght of the original signal. Besides, the 0-padding allows to avoid unpleasant
side effects due to the cyclical convolution.  
In order to accomplish this procedure, every histogram was transformed into a
vector. The implementation lies in the computation of the Fourier trasform of
the smeared signal and the kernel, the ratio between their transforms and the
anti-transformation of the result:
$$
  \hat{F}[s * k] = \hat{F}[s] \cdot \hat{F}[k] \thus
  \hat{F} [s] = \frac{\hat{F}[s * k]}{\hat{F}[k]}
$$

The FFT are efficient algorithms for calculating the DFT. Given a set of $n$
values {$z_i$}, each one is transformed into:
$$
  x_j = \sum_{k=0}^{n-1} z_k \exp \left( - \frac{2 \pi i j k}{n} \right)
$$

The evaluation of the DFT is a matrix-vector multiplication $W \vec{z}$. A
general matrix-vector multiplication takes $O(n^2)$ operations. FFT algorithms,
instad, use a divide-and-conquer strategy to factorize the matrix into smaller
sub-matrices. If $n$ can be factorized into a product of integers $n_1$, $n_2
\ldots n_m$, then the DFT can be computed in $O(n \sum n_i) < O(n^2)$
operations, hence the name.  
The inverse Fourier transform is thereby defined as:
$$
  z_j = \frac{1}{n}
        \sum_{k=0}^{n-1} x_k \exp \left( \frac{2 \pi i j k}{n} \right)
$$

In GSL, `gsl_fft_complex_forward()` and `gsl_fft_complex_inverse()` are 
functions which allow to compute the foreward and inverse transform,
respectively.  
The inputs and outputs for the complex FFT routines are packed arrays of
floating point numbers. In a packed array the real and imaginary parts of
each complex number are placed in alternate neighboring elements. In this
special case, the sequence of values to be transformed is made of real numbers,
but the Fourier transform is not real: it is a complex sequence wich satisfies:
$$
  z_k = z^*_{n-k}
$$

where $z^*$ is the conjugate of $z$. A sequence with this symmetry is called
'half-complex'. This structure requires particular storage layouts for the
forward transform (from real to half-complex) and inverse transform (from
half-complex to real). As a consequence, the routines are divided into two sets:
`gsl_fft_real` and `gsl_fft_halfcomplex`. The symmetry of the half-complex
sequence requires only half of the complex numbers in the output to be stored.
This works for all lengths: when the length is even, the middle value is real.
Thus, only $n$ real numbers are required to store the half-complex sequence
(half for the real part and half for the imaginary).

\begin{figure}
\hypertarget{fig:reorder}{%
\centering
\begin{tikzpicture}
  \definecolor{cyclamen}{RGB}{146, 24, 43}
  % standard histogram
  \draw [thick, cyclamen] (0.5,0) -- (0.5,0.2);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (1.0,0) rectangle (1.5,0.6);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (1.5,0) rectangle (2.0,1.2);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (2.0,0) rectangle (2.5,1.4);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (2.5,0) rectangle (3.0,1.4);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (3.0,0) rectangle (3.5,1.2);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (3.5,0) rectangle (4.0,0.6);
  \draw [thick, cyclamen] (4.5,0) -- (4.5,0.2);
  \draw [thick, ->] (0,0) -- (5,0);
  \draw [thick, ->] (2.5,0) -- (2.5,2);
  % shifted histogram
  \begin{scope}[shift={(7,0)}]
  \draw [thick, cyclamen, fill=cyclamen!25!white] (0.5,0) rectangle (1.0,1.4);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (1.0,0) rectangle (1.5,1.2);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (1.5,0) rectangle (2.0,0.6);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (3.0,0) rectangle (3.5,0.6);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (3.5,0) rectangle (4.0,1.2);
  \draw [thick, cyclamen, fill=cyclamen!25!white] (4.0,0) rectangle (4.5,1.4);
  \draw [thick, ->] (0,0) -- (5,0);
  \draw [thick, ->] (2.5,0) -- (2.5,2);
  \end{scope}
\end{tikzpicture}
\caption{The histogram on the right shows how the real numbers histogram on the
         left is handled by the dedicated GSL functions`.}\label{fig:reorder}
}
\end{figure}

If the bin width is $\Delta \theta$, then the DFT domain ranges from $-1 / (2
\Delta \theta)$ to $+1 / (2 \Delta \theta$). As regards the real values, the
aforementioned GSL functions store the positive values from the beginning of
the array up to the middle and the negative backwards from the end of the array
(see @fig:reorder).  
Whilst do not matters if the convolved histogram has positive or negative
values, the kernel must be centered in zero in order to compute a correct
convolution. This requires the kernel to be made of an ever number of bins
in order to be possible to cut it into two same-lenght halves.

When $\hat{F}[s * k]$ and $\hat{F}[k]$ are computed, they are given in the
half-complex GSL format and their normal format must be restored in order to
use them as standard complex numbers and compute the ratio between them. Then,
the result must return in the half-complex format for the inverse DFT
computation. GSL provides the function `gsl_fft_halfcomplex_unpack()` which
convert the vectors from half-complex format to standard complex format but the
inverse procedure is not provided by GSL and was hence implemented in the
code.  

At the end, the external bins which exceed the original signal size are cut
away in order to restore the original number of bins $n$. Results will be
discussed in @sec:conv_Results.


## Unfolding with Richardson-Lucy

The Richardson–Lucy (RL) deconvolution is an iterative procedure tipically used
for recovering an image that has been blurred by a known 'point spread
function'.

Consider the problem of estimating the frequeny distribution $f(\xi)$ of a
variable $\xi$ when the available measure is a sample {$x_i$} of points
dronwn not by $f(x)$ but by another function $\phi(x)$ such that:
$$
  \phi(x) = \int d\xi \, f(\xi) P(x | \xi)
$$ {#eq:conv}

where $P(x | \xi) \, d\xi$ is the probability (presumed known) that $x$ falls
in the interval $(x, x + dx)$ when $\xi = \xi$. If the so-colled point spread
function $P(x | \xi)$ follows a normal distribution with variance $\sigma$,
namely:
$$
  P(x | \xi) = \frac{1}{\sqrt{2 \pi} \sigma}
  \exp \left( - \frac{(x - \xi)^2}{2 \sigma^2} \right)
$$

then, @eq:conv becomes a convolution and finding $f(\xi)$ turns out to be a
deconvolution.  
An example of this problem is precisely that of correcting an observed
distribution $\phi(x)$ for the effect of observational errors, which are
represented by the function $P (x | \xi)$.

Let $Q(\xi | x) d\xi$ be the probability that $\xi$ comes from the interval
$(\xi, \xi + d\xi)$ when the measured quantity is $x = x$. The probability that
both $x \in (x, x + dx)$ and $(\xi, \xi + d\xi)$ is therefore given by $\phi(x)
dx \cdot Q(\xi | x) d\xi$ which is identical to $f(\xi) d\xi \cdot P(x | \xi)
dx$, hence:

$$
  \phi(x) dx \cdot Q(\xi | x) d\xi = f(\xi) d\xi \cdot P(x | \xi) dx
  \thus Q(\xi | x) = \frac{f(\xi) \cdot P(x | \xi)}{\phi(x)}
$$

$$
  \thus Q(\xi | x) = \frac{f(\xi) \cdot P(x | \xi)}
                          {\int d\xi \, f(\xi) P(x | \xi)}
$$ {#eq:first}

which is the Bayes theorem for conditional probability. From the normalization
of $P(x | \xi)$, it follows also that:
$$
  f(\xi) = \int dx \, \phi(x) Q(\xi | x)
$$ {#eq:second}

Since $Q (\xi | x)$ depends on $f(\xi)$, @eq:second suggests a reiterative
procedure for generating estimates of $f(\xi)$. With a guess for $f(\xi)$ and
a known $P(x | \xi)$, @eq:first can be used to calculate and estimate for
$Q (\xi | x)$. Then, taking the hint provided by @eq:second, an improved
estimate for $f (\xi)$ is generated, using the observed sample {$x_i$} to give
an approximation for $\phi$.  
Thus, if $f^t$ is the $t^{\text{th}}$ estimate, the $t^{\text{th + 1}}$ is:
$$
  f^{t + 1}(\xi) = \int dx \, \phi(x) Q^t(\xi | x)
  \with
  Q^t(\xi | x) = \frac{f^t(\xi) \cdot P(x | \xi)}
                      {\int d\xi \, f^t(\xi) P(x | \xi)}
$$

from which:
$$
  f^{t + 1}(\xi) = f^t(\xi)
  \int dx \, \frac{\phi(x)}{\int d\xi \, f^t(\xi) P(x | \xi)}
  P(x | \xi)
$$ {#eq:solution}

When the spread function $P(x | \xi)$ is Gaussian, @eq:solution can be
rewritten in terms of convolutions:
$$
   f^{t + 1} = f^{t}\left( \frac{\phi}{{f^{t}} * P} * P^{\star} \right)
$$

where $P^{\star}$ is the flipped point spread function [@lucy74].

In this special case, the Gaussian kernel which was convolved with the original
histogram stands for the point spread function and, dealing with discrete
values, the division and multiplication are element wise and the convolution is
to be carried out as described in @sec:convolution.  
When implemented, this method results in an easy step-wise routine:

  - create a flipped copy of the kernel;
  - choose a zero-order estimate for {$f(\xi)$};
  - compute the convolutions, the product and the division at each step;
  - proceed until a given number of reiterations is achieved.

In this case, the zero-order was set $f(\xi) = 0.5 \, \forall \, \xi$. Different
number of iterations where tested. Results are discussed in
@sec:conv_results.


## The earth mover's distance

With the aim of comparing the two deconvolution methods, the similarity of a
deconvolved outcome with the original signal was quantified using the earth
mover's distance.

In statistics, the earth mover's distance (EMD) is the measure of distance
between two probability distributions [@cock41]. Informally, the distributions
are interpreted as two different ways of piling up a certain amount of dirt over
a region and the EMD is the minimum cost of turning one pile into the other,
where the cost is the amount of dirt moved times the distance by which it is
moved. It is valid only if the two distributions have the same integral, that
is if the two piles have the same amount of dirt.  
Computing the EMD is based on a solution to the transportation problem, which
can be formalized as follows.

Consider two vectors $P$ and $Q$ which represent the two probability
distributions whose EMD has to be measured:

$$
  P = \{ (p_1, w_{p1}) \dots (p_m, w_{pm}) \} \et
  Q = \{ (q_1, w_{q1}) \dots (q_n, w_{qn}) \}
$$

L'istogramma P deve essere distrutto in modo tale da ottenere l'istogramma Q,
che in partenza è vuoto ma so che vorrò avere w_qj in ogni bin che sta alla
posizione qj.
- sposto solo da P a Q
- sposto non più di ogni ingresso di P
- ottengo non più di ogni ingreddo di Q
- sposto tutto quello che posso: o ottengo tutto Q o ho finito P

e non devono venire uguali, quindi!


where $p_i$ and $q_i$ are the 'values' (that is, the location of the dirt) and
$w_{pi}$ and $w_{qi}$ are the 'weights' (that is, the quantity of dirt). A
ground distance matrix $D_{ij}$ is defined such as its entries $d_{ij}$ are the
distances between $p_i$ and $q_j$. The aim is to find the flow matrix $F_{ij}$,
where each entry $f_{ij}$ is the flow from $p_i$ to $q_j$ (which would be
the quantity of moved dirt), which minimizes the cost $W$:

$$
  W (P, Q, F) = \sum_{i = 1}^m \sum_{j = 1}^n f_{ij} d_{ij}
$$

with the constraints:

\begin{align*}
  &f_{ij} \ge 0 \hspace{15pt}       &1 \le i \le m \wedge 1 \le j \le n \\
  &\sum_{j = 1}^n f_{ij} \le w_{pi} &1 \le i \le m                      \\
  &\sum_{i = 1}^m f_{ij} \le w_{qj} &1 \le j \le n
\end{align*}
$$
  \sum_{j = 1}^n f_{ij} \sum_{j = 1}^m f_{ij} \le w_{qj}
  = \text{min} \left( \sum_{i = 1}^m w_{pi}, \sum_{j = 1}^n w_{qj} \right)
$$

The first constraint allows moving 'dirt' from $P$ to $Q$ and not vice versa.
The next two constraints limits the amount of supplies that can be sent by the
values in $P$ to their weights, and the values in $Q$ to receive no more
supplies than their weights; the last constraint forces to move the maximum
amount of supplies possible. The total moved amount is the total flow. Once the
transportation problem is solved, and the optimal flow is found, the earth
mover's distance $D$ is defined as the work normalized by the total flow: 

$$
  D (P, Q) = \frac{\sum_{i = 1}^m \sum_{j = 1}^n f_{ij} d_{ij}}
                  {\sum_{i = 1}^m \sum_{j=1}^n f_{ij}}
$$

In this case, where the EMD has to be applied to two same-lenght histograms, the
procedure simplifies a lot. By representing both histograms with two vectors $u$
and $v$, the equation above boils down to [@ramdas17]:

$$
  D (u, v) = \sum_i |U_i - V_i|
$$

where the sum runs over the entries of the vectors $U$ and $V$, which are the
cumulative vectors of the histograms.  
In the code, the following equivalent recursive routine was implemented.

$$
  D (u, v) = \sum_i |D_i| \with
  \begin{cases}
    D_i = v_i - u_i + D_{i-1} \\ 
    D_0 = 0
  \end{cases}
$$

In fact:

\begin{align*}
  D (u, v) &= \sum_i |D_i| = |D_0| + |D_1| + |D_2| + |D_3| + \dots \\
           &= 0 + |v_1 - u_1 + D_0| + 
                  |v_2 - u_2 + D_1| + 
                  |v_3 - u_3 + D_2| + \dots                        \\
           &= |v_1 - u_1| + 
              |v_1 - u_1 + v_2 - u_2| + 
              |v_1 - u_1 + v_2 - u_2 + v_3 - u_3| + \dots          \\
           &= |v_1 - u_i| + 
              |v_1 + v_2 - (u_1 + u_2)| + 
              |v_1 + v_2 + v_3 - (u_1 + u_2 + u_3))| + \dots       \\
           &= |V_1 - U_1| + |V_2 - U_2| + |V_3 - U_3| + \dots      \\
           &= \sum_i |U_i - V_i|
\end{align*}


\textcolor{red}{EMD}

These distances were used to build their empirical cumulative distribution.

\textcolor{red}{empirical distribution}

At 95% confidence level, the compatibility of the deconvolved signal with
the original one cannot be disporoved if its distance from the original signal
is grater than  \textcolor{red}{value}.

\textcolor{red}{counts}


## Results comparison {#sec:conv_results}

As can be seen, increasing the value of $\sigma$ implies a stronger smoothing of
the curve. The FFT deconvolution process seems not to be affected by $\sigma$
amplitude changes: it always gives the same outcome, which is exactly the
original signal. In fact, the FFT is the analitical result of the deconvolution.
In the real world, it is unpratical, since signals are inevitably blurred by
noise.  
The same can't be said about the RL deconvolution, which, on the other hand,
looks heavily influenced by the variance magnitude: the greater $\sigma$, the
worse the deconvolved result. In fact, given the same number of steps, the
deconvolved signal is always the same 'distance' far form the convolved one:
if it very smooth, the deconvolved signal is very smooth too and if the
convolved is less smooth, it is less smooth too.

The original signal is shown below for convenience.


It was also implemented the possibility to add a Poisson noise to the
convolved histogram to check weather the deconvolution is affected or not by
this kind of interference. It was took as an example the case with $\sigma =
\Delta \theta$. In @fig:poisson the results are shown for both methods when a
Poisson noise with mean $\mu = 50$ is employed.  
In both cases, the addition of the noise seems to partially affect the
deconvolution. When the FFT method is applied, it adds little spikes nearly
everywhere on the curve and it is particularly evident on the edges, where the
expected data are very small. On the other hand, the Richardson-Lucy routine is
less affected by this further complication.