ex-7: improve efficiency/purity section

2020-07-05 21:23:20 +02:00 · 2020-07-05 21:23:20 +02:00 · 747f2f4335
commit 747f2f4335
parent b7e1857862
1 changed files with 18 additions and 14 deletions
--- a/notes/sections/7.md
+++ b/notes/sections/7.md
@ -391,24 +391,28 @@ was generated and the points were classified applying both methods. To avoid
 storing large datasets in memory, at each iteration, false positives and
 negatives were recorded using a running statistics method implemented in the
 `gsl_rstat` library. For each sample, the numbers $N_{fn}$ and $N_{fp}$ of
-false negative and false positive were obtained this way: for every noise point
-$x_n$, the threshold function $f(x_n)$ was computed, then:
+false negative and false positive were obtained in this way: for every signal
+point $x_s$, the threshold function $f(x_s)$ was computed, then:

-  - if $f(x) = 0 \thus$ $N_{fn} \to N_{fn}$
-  - if $f(x) \neq 0 \thus$ $N_{fn} \to N_{fn} + 1$
+  - if $f(x_s) = 1 \thus$ $N_{fn} \to N_{fn}$
+  - if $f(x_s) = 0 \thus$ $N_{fn} \to N_{fn} + 1$
+
+and similarly, for the noise points:
+
+  - if $f(x_n) = 1 \thus$ $N_{fp} \to N_{fp} + 1$
+  - if $f(x_n) = 0 \thus$ $N_{fp} \to N_{fp}$

-and similarly for the positive points.  
 Finally, the mean and standard deviation were computed from $N_{fn}$ and
 $N_{fp}$ for every sample and used to estimate the significance $\alpha$
-and not-purity $\beta$ of the classification:
+and false-positive rate $\beta$ of the classification:
 $$
-  \alpha = 1 - \frac{\text{mean}(N_{fn})}{N_s} \et
-  \beta = 1 - \frac{\text{mean}(N_{fp})}{N_n}
+  \alpha = \frac{\text{mean}(N_{fn})}{N_s} \et
+  \beta = \frac{\text{mean}(N_{fp})}{N_n}
 $$
 Results for $N_t = 500$ are shown in @tbl:res_comp. As can be seen, the Fisher
 discriminant gives a nearly perfect classification with a symmetric distribution
-of false negative and false positive, whereas the perceptron shows a little more
-false-positive than false-negative, being also more variable from dataset to
+of true negatives and false positives, whereas the perceptron shows a little more
+false positives than false negatives, being also more variable from dataset to
 dataset.  
 A possible explanation of this fact is that, for linearly separable and normally
 distributed points, the Fisher linear discriminant is an exact analytical
@ -416,13 +420,13 @@ solution, the most powerful one, according to the Neyman-Pearson lemma, whereas
 the perceptron is only expected to converge to the solution and is therefore
 more subject to random fluctuations.

------------------------------------------------------
-            $α$        $σ_α$      $β$        $σ_β$
----------- ---------- ---------- ---------- ---------
+-------------------------------------------------------
+            $1-α$      $σ_{1-α}$  $1-β$      $σ_{1-β}$
+----------- ---------- ---------- ---------- ----------
 Fisher       0.9999     0.33       0.9999     0.33

 Perceptron   0.9999     0.28       0.9995     0.64
------------------------------------------------------
+-------------------------------------------------------

 Table: Results for Fisher and perceptron method. $\sigma_{\alpha}$ and
       $\sigma_{\beta}$ stand for the standard deviation of the false