108 lines
4.2 KiB
Markdown
108 lines
4.2 KiB
Markdown
# Exercise 7
|
||
|
||
## Generating points according to gaussian distributions
|
||
|
||
The firts task of esercise 7 is to generate two sets of 2D points $(x, y)$
|
||
according to two bivariate gaussian distributions with parameters:
|
||
|
||
$$
|
||
\text{signal} \quad
|
||
\begin{cases}
|
||
\mu = (0, 0) \\
|
||
\sigma_x = \sigma_y = 0.3 \\
|
||
\rho = 0.5
|
||
\end{cases}
|
||
\et
|
||
\text{noise} \quad
|
||
\begin{cases}
|
||
\mu = (4, 4) \\
|
||
\sigma_x = \sigma_y = 1 \\
|
||
\rho = 0.4
|
||
\end{cases}
|
||
$$
|
||
|
||
where $\mu$ stands for the mean, $\sigma_x$ and $\sigma_y$ stand for the
|
||
standard deviations in $x$ and $y$ directions respectively and $\rho$ is the
|
||
correlation.
|
||
In the code, default settings are $N_s = 800$ points for the signal and $n_n =
|
||
1000$ points for the noise but can be changed from the command-line. Both
|
||
samples were handled as matrices of dimension $n$ x 2, where $n$ is the number
|
||
of points in the sample. The library `gsl_matrix` provided by GSL was employed
|
||
for this purpose and the function `gsl_ran_bivariate_gaussian()` was used for
|
||
generating the points.
|
||
|
||
Then, a model of classification must be implemented in order to assign each
|
||
point to the right class (signal or noise) to which it 'most probably' belongs
|
||
to. The point is how 'most probably' can be interpreted and implemented.
|
||
|
||
## Fisher linear discriminant
|
||
|
||
The Fisher linear discriminant (FLD) is a linear classification model based on
|
||
dimensionality reduction. It allows to reduce this 2D classification problem
|
||
into a one-dimensional decision surface.
|
||
|
||
Consider the case of two classes, (in this case the signal and the noise): the
|
||
simplest representation of a linear discriminant is obtained by taking a linear
|
||
function of a sampled point 2D $x$ so that:
|
||
|
||
$$
|
||
\hat{x} = w x + w_0
|
||
$$
|
||
|
||
where $w$ is called 'weight vector' and $w_0$ is a bias. The negative of the
|
||
bias is called 'threshold'. An input point $x$ is assigned to the first class
|
||
if $\hat{x} \geqslant 0$ and to the second one otherwise.
|
||
In general, the projection onto one dimension leads to a considerable loss of
|
||
information and classes that are well separated in the original 2D space may
|
||
become strongly overlapping in one dimension. However, by adjusting the
|
||
components of the weight vector, a projection that maximizes the classes
|
||
separation can be selected.
|
||
To begin with, consider a two-classes problem in which there are $N_1$ points of
|
||
class $C_1$ and $N_2$ points of class $C_2$, so that the means $n_1$ and $m_2$
|
||
of the two classes are given by:
|
||
|
||
$$
|
||
m_1 = \frac{1}{N_1} \sum_{n \in C_1} x_n
|
||
\et
|
||
m_2 = \frac{1}{N_2} \sum_{n \in C_2} x_n
|
||
$$
|
||
|
||
The simplest measure of the separation of the classes is the separation of the
|
||
projected class means. This suggests that to choose $w$ so as to maximize:
|
||
|
||
$$
|
||
\hat{m}_2 − \hat{m}_1 = w (m_2 − m_1)
|
||
$$
|
||
|
||
{#fig:overlap}
|
||
|
||
This expression can be made arbitrarily large simply by increasing the magnitude
|
||
of $w$. To solve this problem, $w$ can be costrained to have unit length, so
|
||
that $| w^2 | = 1$. Using a Lagrange multiplier to perform the constrained
|
||
maximization, it can be find that $w \propto (m_2 − m_1)$.
|
||
There is still a problem with this approach, however, as illustrated in
|
||
@fig:overlap: the two classes are well separated in the original 2D space but
|
||
have considerable overlap when projected onto the line joining their means.
|
||
The idea to solve it is to maximize a function that will give a large separation
|
||
between the projected classes means while also giving a small variance within
|
||
each class, thereby minimizing the class overlap.
|
||
The within-classes variance of the transformed data of each $k$ class is given
|
||
by:
|
||
|
||
$$
|
||
s_k^2 = \sum_{n \in C_k} (\hat{x}_n - \hat{m}_k)^2
|
||
$$
|
||
|
||
The total within-classes variance for the whole data set can be simply defined
|
||
as $s^2 = s_1^2 + s_2^2$. The Fisher criterion is derefore defined to be the
|
||
ratio of the between-classes distance to the within-class variance and is given
|
||
by:
|
||
|
||
$$
|
||
J(w) = \frac{(\hat{m}_2 - \hat{m}_1)^2}{s^2}
|
||
$$
|