diff --git a/notes/sections/7.md b/notes/sections/7.md
new file mode 100644
index 0000000..f0a59fe
--- /dev/null
+++ b/notes/sections/7.md
@@ -0,0 +1,107 @@
+# Exercise 7
+
+## Generating points according to gaussian distributions
+
+The firts task of esercise 7 is to generate two sets of 2D points $(x, y)$
+according to two bivariate gaussian distributions with parameters:
+
+$$
+\text{signal} \quad
+\begin{cases}
+\mu = (0, 0)              \\
+\sigma_x = \sigma_y = 0.3 \\
+\rho = 0.5
+\end{cases}
+\et
+\text{noise} \quad
+\begin{cases}
+\mu = (4, 4)              \\
+\sigma_x = \sigma_y = 1   \\
+\rho = 0.4
+\end{cases}
+$$
+
+where $\mu$ stands for the mean, $\sigma_x$ and $\sigma_y$ stand for the
+standard deviations in $x$ and $y$ directions respectively and $\rho$ is the
+correlation.  
+In the code, default settings are $N_s = 800$ points for the signal and $n_n =
+1000$ points for the noise but can be changed from the command-line. Both
+samples were handled as matrices of dimension $n$ x 2, where $n$ is the number
+of points in the sample. The library `gsl_matrix` provided by GSL was employed
+for this purpose and the function `gsl_ran_bivariate_gaussian()` was used for
+generating the points.
+
+Then, a model of classification must be implemented in order to assign each
+point to the right class (signal or noise) to which it 'most probably' belongs
+to. The point is how 'most probably' can be interpreted and implemented.
+
+## Fisher linear discriminant
+
+The Fisher linear discriminant (FLD) is a linear classification model based on
+dimensionality reduction. It allows to reduce this 2D classification problem
+into a one-dimensional decision surface.
+
+Consider the case of two classes, (in this case the signal and the noise): the
+simplest representation of a linear discriminant is obtained by taking a linear
+function of a sampled point 2D $x$ so that:
+
+$$
+  \hat{x} = w x + w_0
+$$
+
+where $w$ is called 'weight vector' and $w_0$ is a bias. The negative of the
+bias is called 'threshold'. An input point $x$ is assigned to the first class
+if $\hat{x} \geqslant 0$ and to the second one otherwise.  
+In general, the projection onto one dimension leads to a considerable loss of
+information and classes that are well separated in the original 2D space may
+become strongly overlapping in one dimension. However, by adjusting the
+components of the weight vector, a projection that maximizes the classes
+separation can be selected.  
+To begin with, consider a two-classes problem in which there are $N_1$ points of
+class $C_1$ and $N_2$ points of class $C_2$, so that the means $n_1$ and $m_2$
+of the two classes are given by:
+
+$$
+  m_1 = \frac{1}{N_1} \sum_{n \in C_1} x_n
+  \et
+  m_2 = \frac{1}{N_2} \sum_{n \in C_2} x_n
+$$
+
+The simplest measure of the separation of the classes is the separation of the
+projected class means. This suggests that to choose $w$ so as to maximize:
+
+$$
+  \hat{m}_2 − \hat{m}_1 = w (m_2 − m_1)
+$$
+
+![The plot on the left shows samples from two classes along with the histograms
+resulting from projection onto the line joining the class means: note that
+there is considerable overlap in the projected space. The right plot shows the
+corresponding projection based on the Fisher linear discriminant, showing the
+greatly improved classes separation.](images/fisher.png){#fig:overlap}
+
+This expression can be made arbitrarily large simply by increasing the magnitude
+of $w$. To solve this problem, $w$ can be costrained to have unit length, so
+that $| w^2 | = 1$. Using a Lagrange multiplier to perform the constrained
+maximization, it can be find that $w \propto (m_2 − m_1)$.  
+There is still a problem with this approach, however, as illustrated in
+@fig:overlap: the two classes are well separated in the original 2D space but
+have considerable overlap when projected onto the line joining their means.  
+The idea to solve it is to maximize a function that will give a large separation
+between the projected classes means while also giving a small variance within
+each class, thereby minimizing the class overlap.  
+The within-classes variance of the transformed data of each $k$ class is given
+by:
+
+$$
+  s_k^2 = \sum_{n \in C_k} (\hat{x}_n - \hat{m}_k)^2
+$$
+
+The total within-classes variance for the whole data set can be simply defined
+as $s^2 = s_1^2 + s_2^2$. The Fisher criterion is derefore defined to be the
+ratio of the between-classes distance to the within-class variance and is given
+by:
+
+$$
+  J(w) = \frac{(\hat{m}_2 - \hat{m}_1)^2}{s^2}
+$$