diff --git a/notes/sections/7.md b/notes/sections/7.md index 27ee1bd..446392e 100644 --- a/notes/sections/7.md +++ b/notes/sections/7.md @@ -366,6 +366,10 @@ $$ Similarly for the case with $e = 1$ and $f(x) = 0$. +![Weiht vector and threshold value obtained with the perceptron method as a + function of the number of iterations. Both level off at the third + iteration.](images/7-iterations.pdf){#fig:iterations} + As far as convergence is concerned, the perceptron will never get to the state with all the input points classified correctly if the training set is not linearly separable, meaning that the signal cannot be separated from the noise @@ -406,7 +410,7 @@ samples was generated and the points were divided into noise and signal applying both methods. To avoid storing large datasets in memory, at each iteration, false positives and negatives were recorded using a running statistics method implemented in the `gsl_rstat` library. For each sample, the -numbers $N_{fn}$ and $N_{fp}$ of false positive and false negative were obtained +numbers $N_{fn}$ and $N_{fp}$ of false negative and false positive were obtained this way: for every noise point $x_n$, the threshold function $f(x_n)$ was computed, then: @@ -430,9 +434,9 @@ false-positive than false-negative, being also more variable from dataset to dataset. A possible explanation of this fact is that, for linearly separable and normally distributed points, the Fisher linear discriminant is an exact analytical -solution, whereas the perceptron is only expected to converge to the solution -and is therefore more subject to random fluctuations. - +solution, the most powerful one, according to the Neyman-Pearson lemma, whereas +the perceptron is only expected to converge to the solution and is therefore +more subject to random fluctuations. ------------------------------------------------------------------------------------------- $\alpha$ $\sigma_{\alpha}$ $\beta$ $\sigma_{\beta}$