diff --git a/notes/sections/7.md b/notes/sections/7.md
index 27ee1bd..446392e 100644
--- a/notes/sections/7.md
+++ b/notes/sections/7.md
@@ -366,6 +366,10 @@ $$
 
 Similarly for the case with $e = 1$ and $f(x) = 0$.
 
+![Weiht vector and threshold value obtained with the perceptron method as a
+  function of the number of iterations. Both level off at the third
+  iteration.](images/7-iterations.pdf){#fig:iterations}
+
 As far as convergence is concerned, the perceptron will never get to the state
 with all the input points classified correctly if the training set is not
 linearly separable, meaning that the signal cannot be separated from the noise
@@ -406,7 +410,7 @@ samples was generated and the points were divided into noise and signal
 applying both methods. To avoid storing large datasets in memory, at each
 iteration, false positives and negatives were recorded using a running
 statistics method implemented in the `gsl_rstat` library. For each sample, the
-numbers $N_{fn}$ and $N_{fp}$ of false positive and false negative were obtained
+numbers $N_{fn}$ and $N_{fp}$ of false negative and false positive were obtained
 this way: for every noise point $x_n$, the threshold function $f(x_n)$ was
 computed, then:
 
@@ -430,9 +434,9 @@ false-positive than false-negative, being also more variable from dataset to
 dataset.  
 A possible explanation of this fact is that, for linearly separable and normally
 distributed points, the Fisher linear discriminant is an exact analytical
-solution, whereas the perceptron is only expected to converge to the solution
-and is therefore more subject to random fluctuations.
-
+solution, the most powerful one, according to the Neyman-Pearson lemma, whereas
+the perceptron is only expected to converge to the solution and is therefore
+more subject to random fluctuations.
 
 -------------------------------------------------------------------------------------------
                   $\alpha$       $\sigma_{\alpha}$        $\beta$        $\sigma_{\beta}$