ex-7: improve efficiency/purity section
This commit is contained in:
parent
b7e1857862
commit
747f2f4335
@ -391,24 +391,28 @@ was generated and the points were classified applying both methods. To avoid
|
||||
storing large datasets in memory, at each iteration, false positives and
|
||||
negatives were recorded using a running statistics method implemented in the
|
||||
`gsl_rstat` library. For each sample, the numbers $N_{fn}$ and $N_{fp}$ of
|
||||
false negative and false positive were obtained this way: for every noise point
|
||||
$x_n$, the threshold function $f(x_n)$ was computed, then:
|
||||
false negative and false positive were obtained in this way: for every signal
|
||||
point $x_s$, the threshold function $f(x_s)$ was computed, then:
|
||||
|
||||
- if $f(x) = 0 \thus$ $N_{fn} \to N_{fn}$
|
||||
- if $f(x) \neq 0 \thus$ $N_{fn} \to N_{fn} + 1$
|
||||
- if $f(x_s) = 1 \thus$ $N_{fn} \to N_{fn}$
|
||||
- if $f(x_s) = 0 \thus$ $N_{fn} \to N_{fn} + 1$
|
||||
|
||||
and similarly, for the noise points:
|
||||
|
||||
- if $f(x_n) = 1 \thus$ $N_{fp} \to N_{fp} + 1$
|
||||
- if $f(x_n) = 0 \thus$ $N_{fp} \to N_{fp}$
|
||||
|
||||
and similarly for the positive points.
|
||||
Finally, the mean and standard deviation were computed from $N_{fn}$ and
|
||||
$N_{fp}$ for every sample and used to estimate the significance $\alpha$
|
||||
and not-purity $\beta$ of the classification:
|
||||
and false-positive rate $\beta$ of the classification:
|
||||
$$
|
||||
\alpha = 1 - \frac{\text{mean}(N_{fn})}{N_s} \et
|
||||
\beta = 1 - \frac{\text{mean}(N_{fp})}{N_n}
|
||||
\alpha = \frac{\text{mean}(N_{fn})}{N_s} \et
|
||||
\beta = \frac{\text{mean}(N_{fp})}{N_n}
|
||||
$$
|
||||
Results for $N_t = 500$ are shown in @tbl:res_comp. As can be seen, the Fisher
|
||||
discriminant gives a nearly perfect classification with a symmetric distribution
|
||||
of false negative and false positive, whereas the perceptron shows a little more
|
||||
false-positive than false-negative, being also more variable from dataset to
|
||||
of true negatives and false positives, whereas the perceptron shows a little more
|
||||
false positives than false negatives, being also more variable from dataset to
|
||||
dataset.
|
||||
A possible explanation of this fact is that, for linearly separable and normally
|
||||
distributed points, the Fisher linear discriminant is an exact analytical
|
||||
@ -416,13 +420,13 @@ solution, the most powerful one, according to the Neyman-Pearson lemma, whereas
|
||||
the perceptron is only expected to converge to the solution and is therefore
|
||||
more subject to random fluctuations.
|
||||
|
||||
------------------------------------------------------
|
||||
$α$ $σ_α$ $β$ $σ_β$
|
||||
----------- ---------- ---------- ---------- ---------
|
||||
-------------------------------------------------------
|
||||
$1-α$ $σ_{1-α}$ $1-β$ $σ_{1-β}$
|
||||
----------- ---------- ---------- ---------- ----------
|
||||
Fisher 0.9999 0.33 0.9999 0.33
|
||||
|
||||
Perceptron 0.9999 0.28 0.9995 0.64
|
||||
------------------------------------------------------
|
||||
-------------------------------------------------------
|
||||
|
||||
Table: Results for Fisher and perceptron method. $\sigma_{\alpha}$ and
|
||||
$\sigma_{\beta}$ stand for the standard deviation of the false
|
||||
|
Loading…
Reference in New Issue
Block a user