5.5 KiB
Exercise 7
Generating points according to gaussian distributions
The firts task of esercise 7 is to generate two sets of 2D points $(x, y)$ according to two bivariate gaussian distributions with parameters:
\text{signal} \quad
\begin{cases}
\mu = (0, 0) \\
\sigma_x = \sigma_y = 0.3 \\
\rho = 0.5
\end{cases}
\et
\text{noise} \quad
\begin{cases}
\mu = (4, 4) \\
\sigma_x = \sigma_y = 1 \\
\rho = 0.4
\end{cases}
where \mu
stands for the mean, \sigma_x
and \sigma_y
are the standard
deviations in x
and y
directions respectively and \rho
is the bivariate
correlation, hence:
\sigma_{xy} = \rho \sigma_x \sigma_y
where \sigma_{xy}
is the covariance of x
and y
.
In the code, default settings are N_s = 800
points for the signal and $N_n =
1000$ points for the noise but can be changed from the command-line. Both
samples were handled as matrices of dimension n
x 2, where n
is the number
of points in the sample. The library gsl_matrix
provided by GSL was employed
for this purpose and the function gsl_ran_bivariate_gaussian()
was used for
generating the points.
Assuming not to know how the points were generated, a model of classification must then be implemented in order to assign each point to the right class (signal or noise) to which it 'most probably' belongs to. The point is how 'most probably' can be interpreted and implemented.
Fisher linear discriminant
The theory
The Fisher linear discriminant (FLD) is a linear classification model based on dimensionality reduction. It allows to reduce this 2D classification problem into a one-dimensional decision surface.
Consider the case of two classes, (in this case the signal and the noise): the
simplest representation of a linear discriminant is obtained by taking a linear
function of a sampled point 2D x
so that:
\hat{x} = w^T x
where w
is the so-called 'weight vector'. An input point x
is commonly
assigned to the first class if \hat{x} \geqslant w_{th}
and to the second one
otherwise, where w_{th}
is a threshold somehow defined.
In general, the projection onto one dimension leads to a considerable loss of
information and classes that are well separated in the original 2D space may
become strongly overlapping in one dimension. However, by adjusting the
components of the weight vector, a projection that maximizes the classes
separation can be selected.
To begin with, consider a two-classes problem in which there are N_1
points of
class C_1
and N_2
points of class C_2
, so that the means n_1
and $m_2$
of the two classes are given by:
m_1 = \frac{1}{N_1} \sum_{n \in C_1} x_n
\et
m_2 = \frac{1}{N_2} \sum_{n \in C_2} x_n
The simplest measure of the separation of the classes is the separation of the
projected class means. This suggests that to choose w
so as to maximize:
\hat{m}_2 − \hat{m}_1 = w^T (m_2 − m_1)
This expression can be made arbitrarily large simply by increasing the magnitude
of w
. To solve this problem, w
can be costrained to have unit length, so
that | w^2 | = 1
. Using a Lagrange multiplier to perform the constrained
maximization, it can be find that w \propto (m_2 − m_1)
.
There is still a problem with this approach, however, as illustrated in
@fig:overlap: the two classes are well separated in the original 2D space but
have considerable overlap when projected onto the line joining their means.
The idea to solve it is to maximize a function that will give a large separation
between the projected classes means while also giving a small variance within
each class, thereby minimizing the class overlap.
The within-classes variance of the transformed data of each k
class is given
by:
s_k^2 = \sum_{n \in C_k} (\hat{x}_n - \hat{m}_k)^2
The total within-classes variance for the whole data set can be simply defined
as s^2 = s_1^2 + s_2^2
. The Fisher criterion is derefore defined to be the
ratio of the between-classes distance to the within-class variance and is given
by:
J(w) = \frac{(\hat{m}_2 - \hat{m}_1)^2}{s^2}
Differentiating J(w)
with respect to w
, it can be found that it is
maximized when:
w = S_b^{-1} (m_2 - m_1)
where S_b
is the within-classes covariance matrix, given by:
S_b = S_1 + S_2
where S_1
and S_2
are the covariance matrix of the two classes, namely:
\begin{pmatrix}
\sigma_x^2 & \sigma_{xy} \\
\sigma_{xy} & \sigma_y^2
\end{pmatrix}
This is not truly a discriminant but rather a specific choice of direction for projection of the data down to one dimension: the projected data can then be used to construct a discriminant by choosing a threshold for the classification.
The code
As stated above, the projection vector is given by
x = S_b^{-1} (\mu_1 - \mu_2)
where \mu_1
and \mu_2
are the two classes means.
r = \frac{N_s}{N_n}
cmpute S_b
S_b = S_1 + S_2
\mu_1 = (\mu_{1x}, \mu_{1y})
the matrix S
is inverted with the Cholesky method, since it is symmetrical
and positive-definite.
diff = \mu_1 - \mu_2
product with the gsl_blas_dgemv()
function provided by GSL.
result normalised with gsl functions.`