ex-6: EMD section completed
This commit is contained in:
parent
ee1d2242eb
commit
000cc827a2
@ -38,8 +38,8 @@ int show_help(char **argv) {
|
||||
|
||||
/* Performs an experiment consisting in
|
||||
*
|
||||
* 1. Measuring the distribution I(θ) by reverse
|
||||
* sampling from an RNG;
|
||||
* 1. Measuring the distribution I(θ) sampling from
|
||||
* an RNG;
|
||||
* 2. Convolving the I(θ) sample with a kernel
|
||||
* to simulate the instrumentation response;
|
||||
* 3. Applying a gaussian noise with σ=opts.noise
|
||||
|
@ -423,36 +423,22 @@ deconvolved outcome with the original signal was quantified using the earth
|
||||
mover's distance.
|
||||
|
||||
In statistics, the earth mover's distance (EMD) is the measure of distance
|
||||
between two probability distributions [@cock41]. Informally, the distributions
|
||||
are interpreted as two different ways of piling up a certain amount of dirt over
|
||||
a region and the EMD is the minimum cost of turning one pile into the other,
|
||||
where the cost is the amount of dirt moved times the distance by which it is
|
||||
moved. It is valid only if the two distributions have the same integral, that
|
||||
is if the two piles have the same amount of dirt.
|
||||
between two distributions [@cock41]. Informally, if one imagines the two
|
||||
distributions as two piles of different amount of dirt in their respective
|
||||
regions, the EMD is the minimum cost of turning one pile into the other,
|
||||
making the first one the most possible similar to the second one, where the
|
||||
cost is the amount of dirt moved times the distance by which it is moved.
|
||||
Computing the EMD is based on a solution to the transportation problem, which
|
||||
can be formalized as follows.
|
||||
|
||||
Consider two vectors $P$ and $Q$ which represent the two probability
|
||||
distributions whose EMD has to be measured:
|
||||
Consider two vectors $P$ and $Q$ which represent the two distributions whose
|
||||
EMD has to be measured:
|
||||
|
||||
$$
|
||||
P = \{ (p_1, w_{p1}) \dots (p_m, w_{pm}) \} \et
|
||||
Q = \{ (q_1, w_{q1}) \dots (q_n, w_{qn}) \}
|
||||
$$
|
||||
|
||||
L'istogramma P deve essere distrutto in modo tale da ottenere l'istogramma Q,
|
||||
che in partenza è vuoto ma so che vorrò avere w_qj in ogni bin che sta alla
|
||||
posizione qj.
|
||||
- sposto solo da P a Q
|
||||
- sposto non più di ogni ingresso di P
|
||||
- ottengo non più di ogni ingreddo di Q
|
||||
- sposto tutto quello che posso: o ottengo tutto Q o ho finito P
|
||||
|
||||
e non devono venire uguali, quindi!
|
||||
|
||||
|
||||
|
||||
|
||||
where $p_i$ and $q_i$ are the 'values' (that is, the location of the dirt) and
|
||||
$w_{pi}$ and $w_{qi}$ are the 'weights' (that is, the quantity of dirt). A
|
||||
ground distance matrix $D_{ij}$ is defined such as its entries $d_{ij}$ are the
|
||||
@ -464,28 +450,40 @@ $$
|
||||
W (P, Q, F) = \sum_{i = 1}^m \sum_{j = 1}^n f_{ij} d_{ij}
|
||||
$$
|
||||
|
||||
with the constraints:
|
||||
The fact is that the $Q$ region is to be considerd empty at the beginning: the
|
||||
'dirt' present in $P$ must be moved to $Q$ in order to reach the same
|
||||
distribution as close as possible. Namely, the following constraints must be
|
||||
satisfied:
|
||||
|
||||
\begin{align*}
|
||||
&f_{ij} \ge 0 \hspace{15pt} &1 \le i \le m \wedge 1 \le j \le n \\
|
||||
&\sum_{j = 1}^n f_{ij} \le w_{pi} &1 \le i \le m \\
|
||||
&\sum_{i = 1}^m f_{ij} \le w_{qj} &1 \le j \le n
|
||||
\end{align*}
|
||||
$$
|
||||
\sum_{j = 1}^n f_{ij} \sum_{j = 1}^m f_{ij} \le w_{qj}
|
||||
&\text{1.} \hspace{20pt} f_{ij} \ge 0 \hspace{15pt}
|
||||
&1 \le i \le m \wedge 1 \le j \le n
|
||||
\\
|
||||
&\text{2.} \hspace{20pt} \sum_{j = 1}^n f_{ij} \le w_{pi}
|
||||
&1 \le i \le m
|
||||
\\
|
||||
&\text{3.} \hspace{20pt} \sum_{i = 1}^m f_{ij} \le w_{qj}
|
||||
&1 \le j \le n
|
||||
\\
|
||||
&\text{4.} \hspace{20pt} \sum_{j = 1}^n f_{ij} \sum_{j = 1}^m f_{ij} \le w_{qj}
|
||||
= \text{min} \left( \sum_{i = 1}^m w_{pi}, \sum_{j = 1}^n w_{qj} \right)
|
||||
$$
|
||||
\end{align*}
|
||||
|
||||
The first constraint allows moving 'dirt' from $P$ to $Q$ and not vice versa.
|
||||
The next two constraints limits the amount of supplies that can be sent by the
|
||||
values in $P$ to their weights, and the values in $Q$ to receive no more
|
||||
supplies than their weights; the last constraint forces to move the maximum
|
||||
amount of supplies possible. The total moved amount is the total flow. Once the
|
||||
transportation problem is solved, and the optimal flow is found, the earth
|
||||
mover's distance $D$ is defined as the work normalized by the total flow:
|
||||
The first constraint allows moving dirt from $P$ to $Q$ and not vice versa; the
|
||||
second limits the amount of dirt moved by each position in $P$ in order to not
|
||||
exceed the available quantity; the third sets a limit to the dirt moved to each
|
||||
position in $Q$ in order to not exceed the required quantity and the last one
|
||||
forces to move the maximum amount of supplies possible: either all the dirt
|
||||
present in $P$ has be moved, or the $Q$ distibution is obtained.
|
||||
The total moved amount is the total flow. If the two distributions have the
|
||||
same amount of dirt, hence all the dirt present in $P$ is necessarily moved to
|
||||
$Q$ and the flow equals the total amount of available dirt.
|
||||
|
||||
Once the transportation problem is solved and the optimal flow is found, the
|
||||
EMD is defined as the work normalized by the total flow:
|
||||
|
||||
$$
|
||||
D (P, Q) = \frac{\sum_{i = 1}^m \sum_{j = 1}^n f_{ij} d_{ij}}
|
||||
\text{EMD} (P, Q) = \frac{\sum_{i = 1}^m \sum_{j = 1}^n f_{ij} d_{ij}}
|
||||
{\sum_{i = 1}^m \sum_{j=1}^n f_{ij}}
|
||||
$$
|
||||
|
||||
@ -494,28 +492,29 @@ procedure simplifies a lot. By representing both histograms with two vectors $u$
|
||||
and $v$, the equation above boils down to [@ramdas17]:
|
||||
|
||||
$$
|
||||
D (u, v) = \sum_i |U_i - V_i|
|
||||
\text{EMD} (u, v) = \sum_i |U_i - V_i|
|
||||
$$
|
||||
|
||||
where the sum runs over the entries of the vectors $U$ and $V$, which are the
|
||||
cumulative vectors of the histograms.
|
||||
In the code, the following equivalent recursive routine was implemented.
|
||||
cumulative vectors of the histograms. In the code, the following equivalent
|
||||
recursive routine was implemented.
|
||||
|
||||
$$
|
||||
D (u, v) = \sum_i |D_i| \with
|
||||
\text{EMD} (u, v) = \sum_i |\text{EMD}_i| \with
|
||||
\begin{cases}
|
||||
D_i = v_i - u_i + D_{i-1} \\
|
||||
D_0 = 0
|
||||
\text{EMD}_i = v_i - u_i + \text{EMD}_{i-1} \\
|
||||
\text{EMD}_0 = 0
|
||||
\end{cases}
|
||||
$$
|
||||
|
||||
In fact:
|
||||
|
||||
\begin{align*}
|
||||
D (u, v) &= \sum_i |D_i| = |D_0| + |D_1| + |D_2| + |D_3| + \dots \\
|
||||
&= 0 + |v_1 - u_1 + D_0| +
|
||||
|v_2 - u_2 + D_1| +
|
||||
|v_3 - u_3 + D_2| + \dots \\
|
||||
\text{EMD} (u, v) &= \sum_i |\text{EMD}_i| = |\text{EMD}_0| + |\text{EMD}_1|
|
||||
+ |\text{EMD}_2| + |\text{EMD}_3| + \dots \\
|
||||
&= 0 + |v_1 - u_1 + \text{EMD}_0| +
|
||||
|v_2 - u_2 + \text{EMD}_1| +
|
||||
|v_3 - u_3 + \text{EMD}_2| + \dots \\
|
||||
&= |v_1 - u_1| +
|
||||
|v_1 - u_1 + v_2 - u_2| +
|
||||
|v_1 - u_1 + v_2 - u_2 + v_3 - u_3| + \dots \\
|
||||
@ -526,19 +525,8 @@ In fact:
|
||||
&= \sum_i |U_i - V_i|
|
||||
\end{align*}
|
||||
|
||||
|
||||
\textcolor{red}{EMD}
|
||||
|
||||
These distances were used to build their empirical cumulative distribution.
|
||||
|
||||
\textcolor{red}{empirical distribution}
|
||||
|
||||
At 95% confidence level, the compatibility of the deconvolved signal with
|
||||
the original one cannot be disporoved if its distance from the original signal
|
||||
is grater than \textcolor{red}{value}.
|
||||
|
||||
\textcolor{red}{counts}
|
||||
|
||||
This simple formula enabled comparisons to be made between a great number of
|
||||
results.
|
||||
|
||||
## Results comparison {#sec:conv_results}
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user