ex-6: EMD section completed
This commit is contained in:
parent
ee1d2242eb
commit
000cc827a2
@ -38,8 +38,8 @@ int show_help(char **argv) {
|
|||||||
|
|
||||||
/* Performs an experiment consisting in
|
/* Performs an experiment consisting in
|
||||||
*
|
*
|
||||||
* 1. Measuring the distribution I(θ) by reverse
|
* 1. Measuring the distribution I(θ) sampling from
|
||||||
* sampling from an RNG;
|
* an RNG;
|
||||||
* 2. Convolving the I(θ) sample with a kernel
|
* 2. Convolving the I(θ) sample with a kernel
|
||||||
* to simulate the instrumentation response;
|
* to simulate the instrumentation response;
|
||||||
* 3. Applying a gaussian noise with σ=opts.noise
|
* 3. Applying a gaussian noise with σ=opts.noise
|
||||||
|
@ -423,36 +423,22 @@ deconvolved outcome with the original signal was quantified using the earth
|
|||||||
mover's distance.
|
mover's distance.
|
||||||
|
|
||||||
In statistics, the earth mover's distance (EMD) is the measure of distance
|
In statistics, the earth mover's distance (EMD) is the measure of distance
|
||||||
between two probability distributions [@cock41]. Informally, the distributions
|
between two distributions [@cock41]. Informally, if one imagines the two
|
||||||
are interpreted as two different ways of piling up a certain amount of dirt over
|
distributions as two piles of different amount of dirt in their respective
|
||||||
a region and the EMD is the minimum cost of turning one pile into the other,
|
regions, the EMD is the minimum cost of turning one pile into the other,
|
||||||
where the cost is the amount of dirt moved times the distance by which it is
|
making the first one the most possible similar to the second one, where the
|
||||||
moved. It is valid only if the two distributions have the same integral, that
|
cost is the amount of dirt moved times the distance by which it is moved.
|
||||||
is if the two piles have the same amount of dirt.
|
|
||||||
Computing the EMD is based on a solution to the transportation problem, which
|
Computing the EMD is based on a solution to the transportation problem, which
|
||||||
can be formalized as follows.
|
can be formalized as follows.
|
||||||
|
|
||||||
Consider two vectors $P$ and $Q$ which represent the two probability
|
Consider two vectors $P$ and $Q$ which represent the two distributions whose
|
||||||
distributions whose EMD has to be measured:
|
EMD has to be measured:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
P = \{ (p_1, w_{p1}) \dots (p_m, w_{pm}) \} \et
|
P = \{ (p_1, w_{p1}) \dots (p_m, w_{pm}) \} \et
|
||||||
Q = \{ (q_1, w_{q1}) \dots (q_n, w_{qn}) \}
|
Q = \{ (q_1, w_{q1}) \dots (q_n, w_{qn}) \}
|
||||||
$$
|
$$
|
||||||
|
|
||||||
L'istogramma P deve essere distrutto in modo tale da ottenere l'istogramma Q,
|
|
||||||
che in partenza è vuoto ma so che vorrò avere w_qj in ogni bin che sta alla
|
|
||||||
posizione qj.
|
|
||||||
- sposto solo da P a Q
|
|
||||||
- sposto non più di ogni ingresso di P
|
|
||||||
- ottengo non più di ogni ingreddo di Q
|
|
||||||
- sposto tutto quello che posso: o ottengo tutto Q o ho finito P
|
|
||||||
|
|
||||||
e non devono venire uguali, quindi!
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
where $p_i$ and $q_i$ are the 'values' (that is, the location of the dirt) and
|
where $p_i$ and $q_i$ are the 'values' (that is, the location of the dirt) and
|
||||||
$w_{pi}$ and $w_{qi}$ are the 'weights' (that is, the quantity of dirt). A
|
$w_{pi}$ and $w_{qi}$ are the 'weights' (that is, the quantity of dirt). A
|
||||||
ground distance matrix $D_{ij}$ is defined such as its entries $d_{ij}$ are the
|
ground distance matrix $D_{ij}$ is defined such as its entries $d_{ij}$ are the
|
||||||
@ -464,28 +450,40 @@ $$
|
|||||||
W (P, Q, F) = \sum_{i = 1}^m \sum_{j = 1}^n f_{ij} d_{ij}
|
W (P, Q, F) = \sum_{i = 1}^m \sum_{j = 1}^n f_{ij} d_{ij}
|
||||||
$$
|
$$
|
||||||
|
|
||||||
with the constraints:
|
The fact is that the $Q$ region is to be considerd empty at the beginning: the
|
||||||
|
'dirt' present in $P$ must be moved to $Q$ in order to reach the same
|
||||||
|
distribution as close as possible. Namely, the following constraints must be
|
||||||
|
satisfied:
|
||||||
|
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
&f_{ij} \ge 0 \hspace{15pt} &1 \le i \le m \wedge 1 \le j \le n \\
|
&\text{1.} \hspace{20pt} f_{ij} \ge 0 \hspace{15pt}
|
||||||
&\sum_{j = 1}^n f_{ij} \le w_{pi} &1 \le i \le m \\
|
&1 \le i \le m \wedge 1 \le j \le n
|
||||||
&\sum_{i = 1}^m f_{ij} \le w_{qj} &1 \le j \le n
|
\\
|
||||||
\end{align*}
|
&\text{2.} \hspace{20pt} \sum_{j = 1}^n f_{ij} \le w_{pi}
|
||||||
$$
|
&1 \le i \le m
|
||||||
\sum_{j = 1}^n f_{ij} \sum_{j = 1}^m f_{ij} \le w_{qj}
|
\\
|
||||||
|
&\text{3.} \hspace{20pt} \sum_{i = 1}^m f_{ij} \le w_{qj}
|
||||||
|
&1 \le j \le n
|
||||||
|
\\
|
||||||
|
&\text{4.} \hspace{20pt} \sum_{j = 1}^n f_{ij} \sum_{j = 1}^m f_{ij} \le w_{qj}
|
||||||
= \text{min} \left( \sum_{i = 1}^m w_{pi}, \sum_{j = 1}^n w_{qj} \right)
|
= \text{min} \left( \sum_{i = 1}^m w_{pi}, \sum_{j = 1}^n w_{qj} \right)
|
||||||
$$
|
\end{align*}
|
||||||
|
|
||||||
The first constraint allows moving 'dirt' from $P$ to $Q$ and not vice versa.
|
The first constraint allows moving dirt from $P$ to $Q$ and not vice versa; the
|
||||||
The next two constraints limits the amount of supplies that can be sent by the
|
second limits the amount of dirt moved by each position in $P$ in order to not
|
||||||
values in $P$ to their weights, and the values in $Q$ to receive no more
|
exceed the available quantity; the third sets a limit to the dirt moved to each
|
||||||
supplies than their weights; the last constraint forces to move the maximum
|
position in $Q$ in order to not exceed the required quantity and the last one
|
||||||
amount of supplies possible. The total moved amount is the total flow. Once the
|
forces to move the maximum amount of supplies possible: either all the dirt
|
||||||
transportation problem is solved, and the optimal flow is found, the earth
|
present in $P$ has be moved, or the $Q$ distibution is obtained.
|
||||||
mover's distance $D$ is defined as the work normalized by the total flow:
|
The total moved amount is the total flow. If the two distributions have the
|
||||||
|
same amount of dirt, hence all the dirt present in $P$ is necessarily moved to
|
||||||
|
$Q$ and the flow equals the total amount of available dirt.
|
||||||
|
|
||||||
|
Once the transportation problem is solved and the optimal flow is found, the
|
||||||
|
EMD is defined as the work normalized by the total flow:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
D (P, Q) = \frac{\sum_{i = 1}^m \sum_{j = 1}^n f_{ij} d_{ij}}
|
\text{EMD} (P, Q) = \frac{\sum_{i = 1}^m \sum_{j = 1}^n f_{ij} d_{ij}}
|
||||||
{\sum_{i = 1}^m \sum_{j=1}^n f_{ij}}
|
{\sum_{i = 1}^m \sum_{j=1}^n f_{ij}}
|
||||||
$$
|
$$
|
||||||
|
|
||||||
@ -494,28 +492,29 @@ procedure simplifies a lot. By representing both histograms with two vectors $u$
|
|||||||
and $v$, the equation above boils down to [@ramdas17]:
|
and $v$, the equation above boils down to [@ramdas17]:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
D (u, v) = \sum_i |U_i - V_i|
|
\text{EMD} (u, v) = \sum_i |U_i - V_i|
|
||||||
$$
|
$$
|
||||||
|
|
||||||
where the sum runs over the entries of the vectors $U$ and $V$, which are the
|
where the sum runs over the entries of the vectors $U$ and $V$, which are the
|
||||||
cumulative vectors of the histograms.
|
cumulative vectors of the histograms. In the code, the following equivalent
|
||||||
In the code, the following equivalent recursive routine was implemented.
|
recursive routine was implemented.
|
||||||
|
|
||||||
$$
|
$$
|
||||||
D (u, v) = \sum_i |D_i| \with
|
\text{EMD} (u, v) = \sum_i |\text{EMD}_i| \with
|
||||||
\begin{cases}
|
\begin{cases}
|
||||||
D_i = v_i - u_i + D_{i-1} \\
|
\text{EMD}_i = v_i - u_i + \text{EMD}_{i-1} \\
|
||||||
D_0 = 0
|
\text{EMD}_0 = 0
|
||||||
\end{cases}
|
\end{cases}
|
||||||
$$
|
$$
|
||||||
|
|
||||||
In fact:
|
In fact:
|
||||||
|
|
||||||
\begin{align*}
|
\begin{align*}
|
||||||
D (u, v) &= \sum_i |D_i| = |D_0| + |D_1| + |D_2| + |D_3| + \dots \\
|
\text{EMD} (u, v) &= \sum_i |\text{EMD}_i| = |\text{EMD}_0| + |\text{EMD}_1|
|
||||||
&= 0 + |v_1 - u_1 + D_0| +
|
+ |\text{EMD}_2| + |\text{EMD}_3| + \dots \\
|
||||||
|v_2 - u_2 + D_1| +
|
&= 0 + |v_1 - u_1 + \text{EMD}_0| +
|
||||||
|v_3 - u_3 + D_2| + \dots \\
|
|v_2 - u_2 + \text{EMD}_1| +
|
||||||
|
|v_3 - u_3 + \text{EMD}_2| + \dots \\
|
||||||
&= |v_1 - u_1| +
|
&= |v_1 - u_1| +
|
||||||
|v_1 - u_1 + v_2 - u_2| +
|
|v_1 - u_1 + v_2 - u_2| +
|
||||||
|v_1 - u_1 + v_2 - u_2 + v_3 - u_3| + \dots \\
|
|v_1 - u_1 + v_2 - u_2 + v_3 - u_3| + \dots \\
|
||||||
@ -526,19 +525,8 @@ In fact:
|
|||||||
&= \sum_i |U_i - V_i|
|
&= \sum_i |U_i - V_i|
|
||||||
\end{align*}
|
\end{align*}
|
||||||
|
|
||||||
|
This simple formula enabled comparisons to be made between a great number of
|
||||||
\textcolor{red}{EMD}
|
results.
|
||||||
|
|
||||||
These distances were used to build their empirical cumulative distribution.
|
|
||||||
|
|
||||||
\textcolor{red}{empirical distribution}
|
|
||||||
|
|
||||||
At 95% confidence level, the compatibility of the deconvolved signal with
|
|
||||||
the original one cannot be disporoved if its distance from the original signal
|
|
||||||
is grater than \textcolor{red}{value}.
|
|
||||||
|
|
||||||
\textcolor{red}{counts}
|
|
||||||
|
|
||||||
|
|
||||||
## Results comparison {#sec:conv_results}
|
## Results comparison {#sec:conv_results}
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user