ex-6: written EMD description and implementation
This commit is contained in:
parent
df5f5f9ac6
commit
aa36fca43e
@ -404,16 +404,9 @@ deconvolved signal is always the same 'distance' far form the convolved one:
|
|||||||
if it very smooth, the deconvolved signal is very smooth too and if the
|
if it very smooth, the deconvolved signal is very smooth too and if the
|
||||||
convolved is less smooth, it is less smooth too.
|
convolved is less smooth, it is less smooth too.
|
||||||
|
|
||||||
It was also implemented the possibility to add a Poisson noise to the
|
The original signal is shown below for convenience.
|
||||||
convolved histogram to check weather the deconvolution is affected or not by
|
|
||||||
this kind of interference. It was took as an example the case with $\sigma =
|
![Example of an intensity histogram.](images/fraun-original.pdf){#fig:original}
|
||||||
\Delta \theta$. In @fig:poisson the results are shown for both methods when a
|
|
||||||
Poisson noise with mean $\mu = 50$ is employed.
|
|
||||||
In both cases, the addition of the noise seems to partially affect the
|
|
||||||
deconvolution. When the FFT method is applied, it adds little spikes nearly
|
|
||||||
everywhere on the curve and it is particularly evident on the edges, where the
|
|
||||||
expected data are very small. On the other hand, the Richardson-Lucy routine is
|
|
||||||
less affected by this further complication.
|
|
||||||
|
|
||||||
<div id="fig:results1">
|
<div id="fig:results1">
|
||||||
![Convolved signal.](images/fraun-conv-0.05.pdf){width=12cm}
|
![Convolved signal.](images/fraun-conv-0.05.pdf){width=12cm}
|
||||||
@ -447,6 +440,17 @@ width.
|
|||||||
Results for $\sigma = \Delta \theta$, where $\Delta \theta$ is the bin width.
|
Results for $\sigma = \Delta \theta$, where $\Delta \theta$ is the bin width.
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
It was also implemented the possibility to add a Poisson noise to the
|
||||||
|
convolved histogram to check weather the deconvolution is affected or not by
|
||||||
|
this kind of interference. It was took as an example the case with $\sigma =
|
||||||
|
\Delta \theta$. In @fig:poisson the results are shown for both methods when a
|
||||||
|
Poisson noise with mean $\mu = 50$ is employed.
|
||||||
|
In both cases, the addition of the noise seems to partially affect the
|
||||||
|
deconvolution. When the FFT method is applied, it adds little spikes nearly
|
||||||
|
everywhere on the curve and it is particularly evident on the edges, where the
|
||||||
|
expected data are very small. On the other hand, the Richardson-Lucy routine is
|
||||||
|
less affected by this further complication.
|
||||||
|
|
||||||
<div id="fig:poisson">
|
<div id="fig:poisson">
|
||||||
![Deconvolved signal with FFT.](images/fraun-noise-fft.pdf){width=12cm}
|
![Deconvolved signal with FFT.](images/fraun-noise-fft.pdf){width=12cm}
|
||||||
|
|
||||||
@ -455,8 +459,8 @@ Results for $\sigma = \Delta \theta$, where $\Delta \theta$ is the bin width.
|
|||||||
Results for $\sigma = \Delta \theta$, with Poisson noise.
|
Results for $\sigma = \Delta \theta$, with Poisson noise.
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
In order to quantify the similarity of the deconvolution outcome with the
|
In order to quantify the similarity of a deconvolution outcome with the original
|
||||||
original signal, a null hypotesis test was made up.
|
signal, a null hypotesis test was made up.
|
||||||
Likewise in @sec:Landau, the original sample was treated as a population from
|
Likewise in @sec:Landau, the original sample was treated as a population from
|
||||||
which other samples of the same size were sampled with replacements. For each
|
which other samples of the same size were sampled with replacements. For each
|
||||||
new sample, the earth mover's distance with respect to the original signal was
|
new sample, the earth mover's distance with respect to the original signal was
|
||||||
@ -469,13 +473,91 @@ a region and the EMD is the minimum cost of turning one pile into the other,
|
|||||||
where the cost is the amount of dirt moved times the distance by which it is
|
where the cost is the amount of dirt moved times the distance by which it is
|
||||||
moved. It is valid only if the two distributions have the same integral, that
|
moved. It is valid only if the two distributions have the same integral, that
|
||||||
is if the two piles have the same amount of dirt.
|
is if the two piles have the same amount of dirt.
|
||||||
Computing the EMD is based on a solution of transportation problem.
|
Computing the EMD is based on a solution to the well-known transportation
|
||||||
|
problem, which can be formalized as follows.
|
||||||
|
|
||||||
\textcolor{red}{earth mover's distance}
|
Consider two vectors:
|
||||||
|
|
||||||
In this case, where the EMD must be applied to two histograms, the procedure
|
$$
|
||||||
simplifies a lot boiling down to the difference of the comulative functions of
|
P = \{ (p_1, w_{p1}) \dots (p_n, w_{pm}) \} \et
|
||||||
the two histograms.
|
Q = \{ (q_1, w_{q1}) \dots (q_n, w_{qn}) \}
|
||||||
|
$$
|
||||||
|
|
||||||
|
where $p_i$ and $q_i$ are the 'values' and $w_{pi}$ and $w_{qi}$ are their
|
||||||
|
weights. The entries $d_{ij}$ of the ground distance matrix $D_{ij}$ are
|
||||||
|
defined as the distances between $p_i$ and $q_j$.
|
||||||
|
The aim is to find the flow $F =$ {$f_{ij}$}, where $f_{ij}$ is the flow
|
||||||
|
between $p_i$ and $p_j$ (which would be the quantity of moved dirt), which
|
||||||
|
minimizes the cost $W$:
|
||||||
|
|
||||||
|
$$
|
||||||
|
W (P, Q, F) = \sum_{i = 1}^m \sum_{j = 1}^n f_{ij} d_{ij}
|
||||||
|
$$
|
||||||
|
|
||||||
|
with the constraints:
|
||||||
|
|
||||||
|
\begin{align*}
|
||||||
|
&f_{ij} \ge 0 \hspace{15pt} &1 \le i \le m \wedge 1 \le j \le n \\
|
||||||
|
&\sum_{j = 1}^n f_{ij} \le w_{pi} &1 \le i \le m \\
|
||||||
|
&\sum_{j = 1}^m f_{ij} \le w_{qj} &1 \le j \le n
|
||||||
|
\end{align*}
|
||||||
|
$$
|
||||||
|
\sum_{j = 1}^n f_{ij} \sum_{j = 1}^m f_{ij} \le w_{qj}
|
||||||
|
= \text{min} \left( \sum_{i = 1}^m w_{pi}, \sum_{j = 1}^n w_{qj} \right)
|
||||||
|
$$
|
||||||
|
|
||||||
|
The first constraint allows moving 'dirt' from $P$ to $Q$ and not vice versa.
|
||||||
|
The next two constraints limits the amount of supplies that can be sent by the
|
||||||
|
values in $P$ to their weights, and the values in $Q$ to receive no more
|
||||||
|
supplies than their weights; the last constraint forces to move the maximum
|
||||||
|
amount of supplies possible. The total moved amount is the total flow. Once the
|
||||||
|
transportation problem is solved, and the optimal flow is found, the earth
|
||||||
|
mover's distance $D$ is defined as the work normalized by the total flow:
|
||||||
|
|
||||||
|
$$
|
||||||
|
D (P, Q) = \frac{\sum_{i = 1}^m \sum_{j = 1}^n f_{ij} d_{ij}}
|
||||||
|
{\sum_{i = 1}^m \sum_{j=1}^n f_{ij}}
|
||||||
|
$$
|
||||||
|
|
||||||
|
In this case, where the EMD must be applied to two same-lenght histograms, the
|
||||||
|
procedure simplifies a lot. By representing both histograms with two vectors $u$
|
||||||
|
and $v$, the equation above boils down to [@ramdas17]:
|
||||||
|
|
||||||
|
$$
|
||||||
|
D (u, v) = \sum_i |U_i - V_i|
|
||||||
|
$$
|
||||||
|
|
||||||
|
where the sum runs over the entries of the vectors $U$ and $V$, which are the
|
||||||
|
cumulative vectors of the histograms.
|
||||||
|
In the code, the following equivalent recursive routine was implemented.
|
||||||
|
|
||||||
|
$$
|
||||||
|
D (u, v) = \sum_i |D_i| \with
|
||||||
|
\begin{cases}
|
||||||
|
D_i = v_i - u_i + D_{i-1} \\
|
||||||
|
D_0 = 0
|
||||||
|
\end{cases}
|
||||||
|
$$
|
||||||
|
|
||||||
|
In fact:
|
||||||
|
|
||||||
|
\begin{align*}
|
||||||
|
D (u, v) &= \sum_i |D_i| = |D_0| + |D_1| + |D_2| + |D_3| + \dots \\
|
||||||
|
&= 0 + |v_1 - u_1 + D_0| +
|
||||||
|
|v_2 - u_2 + D_1| +
|
||||||
|
|v_3 - u_3 + D_2| + \dots \\
|
||||||
|
&= |v_1 - u_1| +
|
||||||
|
|v_1 - u_1 + v_2 - u_2| +
|
||||||
|
|v_1 - u_1 + v_2 - u_2 + v_3 - u_3| + \dots \\
|
||||||
|
&= |v_1 - u_i| +
|
||||||
|
|v_1 + v_2 - (u_1 + u_2)| +
|
||||||
|
|v_1 + v_2 + v_3 - (u_1 + u_2 + u_3))| + \dots \\
|
||||||
|
&= |V_1 - U_1| + |V_2 - U_2| + |V_3 - U_3| + \dots \\
|
||||||
|
&= \sum_i |U_i - V_i|
|
||||||
|
\end{align*}
|
||||||
|
|
||||||
|
|
||||||
|
\textcolor{red}{EMD}
|
||||||
|
|
||||||
These distances were used to build their empirical cumulative distribution.
|
These distances were used to build their empirical cumulative distribution.
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user