diff --git a/notes/sections/1.md b/notes/sections/1.md
index 6386413..3110004 100644
--- a/notes/sections/1.md
+++ b/notes/sections/1.md
@@ -149,11 +149,12 @@ To obtain a better estimate of the mode and its error, the above procedure was
 bootstrapped. The original sample was treated as a population and used to build
 100 other samples of the same size, by *sampling with replacements*. For each one
 of the new samples, the above statistic was computed. By simply taking the
-mean of these statistics the following estimate was obtained:
+mean and standard deviation of these statistics the following estimate was
+obtained:
 $$
   \text{observed mode: } m_o = \num{-0.29 \pm 0.19}
 $$
-In order to compare the values $m_e$ and $m_0$, the following compatibility
+In order to compare the values $m_e$ and $m_o$, the following compatibility
 $t$-test was applied:
 $$
   p = 1 - \text{erf}\left(\frac{t}{\sqrt{2}}\right)\ \with
@@ -184,7 +185,7 @@ middle elements otherwise.
 
 The expected median was derived from the quantile function (QDF) of the Landau
 distribution[^1].
-Once this is know, the median is simply given by $\text{QDF}(1/2)$. Since both
+Once this is known, the median is simply given by $\text{QDF}(1/2)$. Since both
 the CDF and QDF have no known closed form, they must be computed numerically.
 The cumulative probability was computed by quadrature-based numerical
 integration of the PDF (`gsl_integration_qagiu()` function in GSL). The function
@@ -210,13 +211,13 @@ where the absolute and relative tolerances $\varepsilon_\text{abs}$ and
 $\varepsilon_\text{rel}$ were set to \num{1e-10} and \num{1e-6},
 respectively.  
 As for the QDF, this was implemented by numerically inverting the CDF. This was
-done by solving the equation;
+done by solving the equation for x:
 $$
   p(x) = p_0
 $$
-for x, given a probability value $p_0$, where $p(x)$ is the CDF. The (unique)
-root of this equation was found by a root-finding routine
-(`gsl_root_fsolver_brent` in GSL) based on the Brent-Dekker method.
+given a probability value $p_0$, where $p(x)$ is the CDF. The (unique) root of
+this equation was found by a root-finding routine (`gsl_root_fsolver_brent` in
+GSL) based on the Brent-Dekker method.
 The following condition was checked for convergence:
 $$
   |a - b| < \varepsilon_\text{abs} + \varepsilon_\text{rel} \min(|a|, |b|)
diff --git a/notes/sections/2.md b/notes/sections/2.md
index 6f7deb1..bb679d0 100644
--- a/notes/sections/2.md
+++ b/notes/sections/2.md
@@ -10,7 +10,7 @@ $$
     \sum_{k=1}^{n} \frac{1}{k}
   - \ln(n) \right)
 $$ {#eq:gamma}
-and represents the limiting blue area in @fig:gamma. The first 30 digits of
+and represents the limiting red area in @fig:gamma. The first 30 digits of
 $\gamma$ are:
 $$
   \gamma = 0.57721\ 56649\ 01532\ 86060\ 65120\ 90082 \dots
@@ -52,7 +52,7 @@ efficiency of the methods lies on how quickly they converge to their limit.
   \draw (7.0,-0.05) -- (7.0,0.05); \node [below, scale=0.7] at (7.0,-0.05) {7};
 \end{tikzpicture}
 \caption{The area of the red region converges to the Euler–Mascheroni
-         constant..}\label{fig:gamma}
+         constant.}\label{fig:gamma}
 }
 \end{figure}
 
@@ -109,10 +109,8 @@ sign, 8 for the exponent and 55 for the mantissa, hence:
 $$
   2^{55} = 10^{d} \thus d = 55 \cdot \log(2) \sim 16.6
 $$
-Only 10 digits were correctly computed: this means that when the terms of the
-series start being smaller than the smallest representable double, the sum of
-all the remaining terms gives a number $\propto 10^{-11}$.  The best result is
-shown in @tbl:naive-res.
+But only 10 digits were correctly computed. The best result is shown in
+@tbl:naive-res.
 
 ------- --------------------
 exact	  0.57721 56649 01533
diff --git a/notes/sections/3.md b/notes/sections/3.md
index 150a8b0..fd772ab 100644
--- a/notes/sections/3.md
+++ b/notes/sections/3.md
@@ -13,7 +13,7 @@ distribution function $F$:
 \end{align*}
 where $\theta$ and $\phi$ are, respectively, the polar and azimuthal angles, and
 $$
-  \alpha_0 = 0.65 \et \beta_0 = 0.06 \et \gamma_0 = -0.18
+  \alpha = 0.65 \et \beta = 0.06 \et \gamma = -0.18
 $$
 To generate the points, a *hit-miss* method was employed:
 
diff --git a/notes/sections/5.md b/notes/sections/5.md
index 15c9122..d12a298 100644
--- a/notes/sections/5.md
+++ b/notes/sections/5.md
@@ -49,9 +49,9 @@ approximate $I$ as:
 $$
   I \approx I_N = \frac{V}{N} \sum_{i=1}^N f(x_i) = V \cdot \avg{f}
 $$
-If $x_i$ are uniformly distributed $I_N \rightarrow I$ for $N \rightarrow +
-\infty$ by the law of large numbers, whereas the integral variance can be
-estimated as:
+If $x_i$ are uniformly distributed, $I_N \rightarrow I$ for $N \rightarrow +
+\infty$ by the law of large numbers, whereas the integral variance $\sigma^2_I$
+can be estimated as:
 $$
   \sigma^2_f = \frac{1}{N - 1}
     \sum_{i = 1}^N \left( f(x_i) - \avg{f} \right)^2
diff --git a/notes/sections/6.md b/notes/sections/6.md
index 8ce07dc..66e1df9 100644
--- a/notes/sections/6.md
+++ b/notes/sections/6.md
@@ -123,7 +123,7 @@ where:
   - $(\cdot, \cdot)$ is an inner product.
 
 Given a signal $s$ of $n$ elements and a kernel $k$ of $m$ elements,
-their convolution is a vector of $n + m + 1$ elements computed
+their convolution $c$ is a vector of $n + m + 1$ elements computed
 by flipping $s$ ($R$ operator) and shifting its indices ($T_i$ operator):
 $$
   c_i = (s, T_i \, R \, k)
@@ -446,8 +446,8 @@ close as possible. Formally, the following constraints must be satisfied:
   &\text{3.} \hspace{20pt} \sum_{i = 1}^m f_{ij} \le w_{qj}
   &1 \le j \le n
   \\
-  &\text{4.} \hspace{20pt} \sum_{j = 1}^n f_{ij} \sum_{j = 1}^m f_{ij} \le w_{qj}
-  = \text{min} \left( \sum_{i = 1}^m w_{pi}, \sum_{j = 1}^n w_{qj} \right)
+  &\text{4.} \hspace{20pt} \sum_{j = 1}^n \sum_{j = 1}^m f_{ij} \le
+  \text{min} \left( \sum_{i = 1}^m w_{pi}, \sum_{j = 1}^n w_{qj} \right)
 \end{align*}
 The first constraint allows moving dirt from $P$ to $Q$ and not vice versa; the
 second limits the amount of dirt moved by each position in $P$ in order to not
@@ -549,9 +549,9 @@ a large kernel, the convergence is very slow, even if the best results are
 close to the one found for $\sigma = 0.5$.
 The following $r$s were chosen as the most fitting:
 \begin{align*}
-  \sigma = 0.1 \, \Delta \theta &\thus n^{\text{best}} = 2    \\
-  \sigma = 0.5 \, \Delta \theta &\thus n^{\text{best}} = 10   \\
-  \sigma = 1   \, \Delta \theta &\thus n^{\text{best}} = \num{5e3}
+  \sigma = 0.1 \, \Delta \theta &\thus r^{\text{best}} = 2    \\
+  \sigma = 0.5 \, \Delta \theta &\thus r^{\text{best}} = 10   \\
+  \sigma = 1   \, \Delta \theta &\thus r^{\text{best}} = \num{5e3}
 \end{align*}
 
 Note the difference between  @fig:rless-0.1 and the plots resulting from $\sigma =
diff --git a/notes/sections/7.md b/notes/sections/7.md
index 6ad4e9d..7cb6ef6 100644
--- a/notes/sections/7.md
+++ b/notes/sections/7.md
@@ -86,8 +86,8 @@ $$
   \tilde{\mu}_2 − \tilde{\mu}_1 = w^T (\mu_2 − \mu_1)
 $$
 This expression can be made arbitrarily large simply by increasing the
-magnitude of $w$, fortunately the problem is easily solved by requiring $w$
-to be normalised: $| w^2 | = 1$. Using a Lagrange multiplier to perform the
+magnitude of $w$ but, fortunately, the problem is easily solved by requiring
+$w$ to be normalised: $| w^2 | = 1$. Using a Lagrange multiplier to perform the
 constrained maximization, it can be found that $w \propto (\mu_2 − \mu_1)$,
 meaning that the line onto the points must be projected is the one joining the
 class means.  
@@ -334,21 +334,21 @@ To see how it works, consider the four possible situations:
     \quad f(x) = 0  \quad \Longrightarrow \quad \Delta = 0$  
     the current estimations work properly: $b$ and $w$ do not need to be updated;
   - $e = 1 \quad \wedge \quad f(x) = 0 \quad \Longrightarrow \quad
-    \Delta = 1$  
+    \Delta \propto 1$  
     the current $b$ and $w$ underestimate the correct output: they must be
     increased;
   - $e = 0 \quad \wedge \quad f(x) = 1 \quad \Longrightarrow \quad
-    \Delta = -1$  
+    \Delta \propto -1$  
     the current $b$ and $w$ overestimate the correct output: they must be
     decreased.
 
 Whilst the $b$ updating is obvious, as regards $w$ the following consideration
 may help clarify. Consider the case with $e = 0 \quad \wedge \quad f(x) = 1
-\quad \Longrightarrow \quad \Delta = -1$:
+\quad \Longrightarrow \quad \Delta = -r$:
 $$
   w^T \cdot x \to (w^T + \Delta x^T) \cdot x
               = w^T \cdot x + \Delta |x|^2
-              = w^T \cdot x - |x|^2 \leq w^T \cdot x
+              = w^T \cdot x - r|x|^2 \leq w^T \cdot x
 $$
 Similarly for the case with $e = 1$ and $f(x) = 0$.
 
@@ -399,8 +399,8 @@ $x_n$, the threshold function $f(x_n)$ was computed, then:
 
 and similarly for the positive points.  
 Finally, the mean and standard deviation were computed from $N_{fn}$ and
-$N_{fp}$ for every sample and used to estimate the purity $\alpha$ and
-efficiency $\beta$ of the classification:
+$N_{fp}$ for every sample and used to estimate the significance $\alpha$
+and not-purity $\beta$ of the classification:
 $$
   \alpha = 1 - \frac{\text{mean}(N_{fn})}{N_s} \et
   \beta = 1 - \frac{\text{mean}(N_{fp})}{N_n}