ex-6: written EMD description and implementation
This commit is contained in:
parent
df5f5f9ac6
commit
aa36fca43e
@ -404,16 +404,9 @@ deconvolved signal is always the same 'distance' far form the convolved one:
|
||||
if it very smooth, the deconvolved signal is very smooth too and if the
|
||||
convolved is less smooth, it is less smooth too.
|
||||
|
||||
It was also implemented the possibility to add a Poisson noise to the
|
||||
convolved histogram to check weather the deconvolution is affected or not by
|
||||
this kind of interference. It was took as an example the case with $\sigma =
|
||||
\Delta \theta$. In @fig:poisson the results are shown for both methods when a
|
||||
Poisson noise with mean $\mu = 50$ is employed.
|
||||
In both cases, the addition of the noise seems to partially affect the
|
||||
deconvolution. When the FFT method is applied, it adds little spikes nearly
|
||||
everywhere on the curve and it is particularly evident on the edges, where the
|
||||
expected data are very small. On the other hand, the Richardson-Lucy routine is
|
||||
less affected by this further complication.
|
||||
The original signal is shown below for convenience.
|
||||
|
||||
![Example of an intensity histogram.](images/fraun-original.pdf){#fig:original}
|
||||
|
||||
<div id="fig:results1">
|
||||
![Convolved signal.](images/fraun-conv-0.05.pdf){width=12cm}
|
||||
@ -447,6 +440,17 @@ width.
|
||||
Results for $\sigma = \Delta \theta$, where $\Delta \theta$ is the bin width.
|
||||
</div>
|
||||
|
||||
It was also implemented the possibility to add a Poisson noise to the
|
||||
convolved histogram to check weather the deconvolution is affected or not by
|
||||
this kind of interference. It was took as an example the case with $\sigma =
|
||||
\Delta \theta$. In @fig:poisson the results are shown for both methods when a
|
||||
Poisson noise with mean $\mu = 50$ is employed.
|
||||
In both cases, the addition of the noise seems to partially affect the
|
||||
deconvolution. When the FFT method is applied, it adds little spikes nearly
|
||||
everywhere on the curve and it is particularly evident on the edges, where the
|
||||
expected data are very small. On the other hand, the Richardson-Lucy routine is
|
||||
less affected by this further complication.
|
||||
|
||||
<div id="fig:poisson">
|
||||
![Deconvolved signal with FFT.](images/fraun-noise-fft.pdf){width=12cm}
|
||||
|
||||
@ -455,8 +459,8 @@ Results for $\sigma = \Delta \theta$, where $\Delta \theta$ is the bin width.
|
||||
Results for $\sigma = \Delta \theta$, with Poisson noise.
|
||||
</div>
|
||||
|
||||
In order to quantify the similarity of the deconvolution outcome with the
|
||||
original signal, a null hypotesis test was made up.
|
||||
In order to quantify the similarity of a deconvolution outcome with the original
|
||||
signal, a null hypotesis test was made up.
|
||||
Likewise in @sec:Landau, the original sample was treated as a population from
|
||||
which other samples of the same size were sampled with replacements. For each
|
||||
new sample, the earth mover's distance with respect to the original signal was
|
||||
@ -469,13 +473,91 @@ a region and the EMD is the minimum cost of turning one pile into the other,
|
||||
where the cost is the amount of dirt moved times the distance by which it is
|
||||
moved. It is valid only if the two distributions have the same integral, that
|
||||
is if the two piles have the same amount of dirt.
|
||||
Computing the EMD is based on a solution of transportation problem.
|
||||
Computing the EMD is based on a solution to the well-known transportation
|
||||
problem, which can be formalized as follows.
|
||||
|
||||
\textcolor{red}{earth mover's distance}
|
||||
Consider two vectors:
|
||||
|
||||
In this case, where the EMD must be applied to two histograms, the procedure
|
||||
simplifies a lot boiling down to the difference of the comulative functions of
|
||||
the two histograms.
|
||||
$$
|
||||
P = \{ (p_1, w_{p1}) \dots (p_n, w_{pm}) \} \et
|
||||
Q = \{ (q_1, w_{q1}) \dots (q_n, w_{qn}) \}
|
||||
$$
|
||||
|
||||
where $p_i$ and $q_i$ are the 'values' and $w_{pi}$ and $w_{qi}$ are their
|
||||
weights. The entries $d_{ij}$ of the ground distance matrix $D_{ij}$ are
|
||||
defined as the distances between $p_i$ and $q_j$.
|
||||
The aim is to find the flow $F =$ {$f_{ij}$}, where $f_{ij}$ is the flow
|
||||
between $p_i$ and $p_j$ (which would be the quantity of moved dirt), which
|
||||
minimizes the cost $W$:
|
||||
|
||||
$$
|
||||
W (P, Q, F) = \sum_{i = 1}^m \sum_{j = 1}^n f_{ij} d_{ij}
|
||||
$$
|
||||
|
||||
with the constraints:
|
||||
|
||||
\begin{align*}
|
||||
&f_{ij} \ge 0 \hspace{15pt} &1 \le i \le m \wedge 1 \le j \le n \\
|
||||
&\sum_{j = 1}^n f_{ij} \le w_{pi} &1 \le i \le m \\
|
||||
&\sum_{j = 1}^m f_{ij} \le w_{qj} &1 \le j \le n
|
||||
\end{align*}
|
||||
$$
|
||||
\sum_{j = 1}^n f_{ij} \sum_{j = 1}^m f_{ij} \le w_{qj}
|
||||
= \text{min} \left( \sum_{i = 1}^m w_{pi}, \sum_{j = 1}^n w_{qj} \right)
|
||||
$$
|
||||
|
||||
The first constraint allows moving 'dirt' from $P$ to $Q$ and not vice versa.
|
||||
The next two constraints limits the amount of supplies that can be sent by the
|
||||
values in $P$ to their weights, and the values in $Q$ to receive no more
|
||||
supplies than their weights; the last constraint forces to move the maximum
|
||||
amount of supplies possible. The total moved amount is the total flow. Once the
|
||||
transportation problem is solved, and the optimal flow is found, the earth
|
||||
mover's distance $D$ is defined as the work normalized by the total flow:
|
||||
|
||||
$$
|
||||
D (P, Q) = \frac{\sum_{i = 1}^m \sum_{j = 1}^n f_{ij} d_{ij}}
|
||||
{\sum_{i = 1}^m \sum_{j=1}^n f_{ij}}
|
||||
$$
|
||||
|
||||
In this case, where the EMD must be applied to two same-lenght histograms, the
|
||||
procedure simplifies a lot. By representing both histograms with two vectors $u$
|
||||
and $v$, the equation above boils down to [@ramdas17]:
|
||||
|
||||
$$
|
||||
D (u, v) = \sum_i |U_i - V_i|
|
||||
$$
|
||||
|
||||
where the sum runs over the entries of the vectors $U$ and $V$, which are the
|
||||
cumulative vectors of the histograms.
|
||||
In the code, the following equivalent recursive routine was implemented.
|
||||
|
||||
$$
|
||||
D (u, v) = \sum_i |D_i| \with
|
||||
\begin{cases}
|
||||
D_i = v_i - u_i + D_{i-1} \\
|
||||
D_0 = 0
|
||||
\end{cases}
|
||||
$$
|
||||
|
||||
In fact:
|
||||
|
||||
\begin{align*}
|
||||
D (u, v) &= \sum_i |D_i| = |D_0| + |D_1| + |D_2| + |D_3| + \dots \\
|
||||
&= 0 + |v_1 - u_1 + D_0| +
|
||||
|v_2 - u_2 + D_1| +
|
||||
|v_3 - u_3 + D_2| + \dots \\
|
||||
&= |v_1 - u_1| +
|
||||
|v_1 - u_1 + v_2 - u_2| +
|
||||
|v_1 - u_1 + v_2 - u_2 + v_3 - u_3| + \dots \\
|
||||
&= |v_1 - u_i| +
|
||||
|v_1 + v_2 - (u_1 + u_2)| +
|
||||
|v_1 + v_2 + v_3 - (u_1 + u_2 + u_3))| + \dots \\
|
||||
&= |V_1 - U_1| + |V_2 - U_2| + |V_3 - U_3| + \dots \\
|
||||
&= \sum_i |U_i - V_i|
|
||||
\end{align*}
|
||||
|
||||
|
||||
\textcolor{red}{EMD}
|
||||
|
||||
These distances were used to build their empirical cumulative distribution.
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user