ex-7: went on writing the FLD
This commit is contained in:
parent
ceedd61f00
commit
a1b84f022d
@ -1,9 +1,9 @@
|
||||
# Exercise 7
|
||||
|
||||
## Generating points according to gaussian distributions
|
||||
## Generating points according to Gaussian distributions {#sec:sampling}
|
||||
|
||||
The firts task of esercise 7 is to generate two sets of 2D points $(x, y)$
|
||||
according to two bivariate gaussian distributions with parameters:
|
||||
The first task of exercise 7 is to generate two sets of 2D points $(x, y)$
|
||||
according to two bivariate Gaussian distributions with parameters:
|
||||
|
||||
$$
|
||||
\text{signal} \quad
|
||||
@ -44,15 +44,15 @@ must then be implemented in order to assign each point to the right class
|
||||
|
||||
## Fisher linear discriminant
|
||||
|
||||
### The theory
|
||||
### The projection direction
|
||||
|
||||
The Fisher linear discriminant (FLD) is a linear classification model based on
|
||||
dimensionality reduction. It allows to reduce this 2D classification problem
|
||||
into a one-dimensional decision surface.
|
||||
|
||||
Consider the case of two classes, (in this case the signal and the noise): the
|
||||
Consider the case of two classes (in this case the signal and the noise): the
|
||||
simplest representation of a linear discriminant is obtained by taking a linear
|
||||
function of a sampled point 2D $x$ so that:
|
||||
function of a sampled 2D point $x$ so that:
|
||||
|
||||
$$
|
||||
\hat{x} = w^T x
|
||||
@ -60,15 +60,14 @@ $$
|
||||
|
||||
where $w$ is the so-called 'weight vector'. An input point $x$ is commonly
|
||||
assigned to the first class if $\hat{x} \geqslant w_{th}$ and to the second one
|
||||
otherwise, where $w_{th}$ is a threshold somehow defined.
|
||||
otherwise, where $w_{th}$ is a threshold value somehow defined.
|
||||
In general, the projection onto one dimension leads to a considerable loss of
|
||||
information and classes that are well separated in the original 2D space may
|
||||
become strongly overlapping in one dimension. However, by adjusting the
|
||||
components of the weight vector, a projection that maximizes the classes
|
||||
separation can be selected.
|
||||
To begin with, consider a two-classes problem in which there are $N_1$ points of
|
||||
class $C_1$ and $N_2$ points of class $C_2$, so that the means $n_1$ and $m_2$
|
||||
of the two classes are given by:
|
||||
To begin with, consider $N_1$ points of class $C_1$ and $N_2$ points of class
|
||||
$C_2$, so that the means $m_1$ and $m_2$ of the two classes are given by:
|
||||
|
||||
$$
|
||||
m_1 = \frac{1}{N_1} \sum_{n \in C_1} x_n
|
||||
@ -77,29 +76,30 @@ $$
|
||||
$$
|
||||
|
||||
The simplest measure of the separation of the classes is the separation of the
|
||||
projected class means. This suggests that to choose $w$ so as to maximize:
|
||||
projected class means. This suggests to choose $w$ so as to maximize:
|
||||
|
||||
$$
|
||||
\hat{m}_2 − \hat{m}_1 = w^T (m_2 − m_1)
|
||||
$$
|
||||
|
||||
This expression can be made arbitrarily large simply by increasing the magnitude
|
||||
of $w$. To solve this problem, $w$ can be constrained to have unit length, so
|
||||
that $| w^2 | = 1$. Using a Lagrange multiplier to perform the constrained
|
||||
maximization, it can be found that $w \propto (m_2 − m_1)$.
|
||||
|
||||
![The plot on the left shows samples from two classes along with the histograms
|
||||
resulting from projection onto the line joining the class means: note that
|
||||
there is considerable overlap in the projected space. The right plot shows the
|
||||
corresponding projection based on the Fisher linear discriminant, showing the
|
||||
greatly improved classes separation.](images/fisher.png){#fig:overlap}
|
||||
|
||||
This expression can be made arbitrarily large simply by increasing the magnitude
|
||||
of $w$. To solve this problem, $w$ can be costrained to have unit length, so
|
||||
that $| w^2 | = 1$. Using a Lagrange multiplier to perform the constrained
|
||||
maximization, it can be find that $w \propto (m_2 − m_1)$.
|
||||
There is still a problem with this approach, however, as illustrated in
|
||||
@fig:overlap: the two classes are well separated in the original 2D space but
|
||||
have considerable overlap when projected onto the line joining their means.
|
||||
The idea to solve it is to maximize a function that will give a large separation
|
||||
between the projected classes means while also giving a small variance within
|
||||
each class, thereby minimizing the class overlap.
|
||||
The within-classes variance of the transformed data of each $k$ class is given
|
||||
The within-classes variance of the transformed data of each class $k$ is given
|
||||
by:
|
||||
|
||||
$$
|
||||
@ -107,9 +107,9 @@ $$
|
||||
$$
|
||||
|
||||
The total within-classes variance for the whole data set can be simply defined
|
||||
as $s^2 = s_1^2 + s_2^2$. The Fisher criterion is derefore defined to be the
|
||||
ratio of the between-classes distance to the within-class variance and is given
|
||||
by:
|
||||
as $s^2 = s_1^2 + s_2^2$. The Fisher criterion is therefore defined to be the
|
||||
ratio of the between-classes distance to the within-classes variance and is
|
||||
given by:
|
||||
|
||||
$$
|
||||
J(w) = \frac{(\hat{m}_2 - \hat{m}_1)^2}{s^2}
|
||||
@ -122,7 +122,7 @@ $$
|
||||
w = S_b^{-1} (m_2 - m_1)
|
||||
$$
|
||||
|
||||
where $S_b$ is the within-classes covariance matrix, given by:
|
||||
where $S_b$ is the covariance matrix, given by:
|
||||
|
||||
$$
|
||||
S_b = S_1 + S_2
|
||||
@ -142,35 +142,51 @@ projection of the data down to one dimension: the projected data can then be
|
||||
used to construct a discriminant by choosing a threshold for the
|
||||
classification.
|
||||
|
||||
### The code
|
||||
When implemented, the parameters given in @sec:sampling were used to compute
|
||||
the covariance matrices $S_1$ and $S_2$ of the two classes and their sum $S$.
|
||||
Then $S$, being a symmetrical and positive-definite matrix, was inverted with
|
||||
the Cholesky method, already discussed in @sec:MLM.
|
||||
Lastly, the matrix-vector product was computed with the `gsl_blas_dgemv()`
|
||||
function provided by GSL.
|
||||
|
||||
As stated above, the projection vector is given by
|
||||
### The threshold
|
||||
|
||||
The cut was fixed by the condition of conditional probability being the same
|
||||
for each class:
|
||||
|
||||
$$
|
||||
x = S_b^{-1} (\mu_1 - \mu_2)
|
||||
t_{\text{cut}} = x \, | \hspace{20pt}
|
||||
\frac{P(c_1 | x)}{P(c_2 | x)} =
|
||||
\frac{p(x | c_1) \, p(c_1)}{p(x | c_1) \, p(c_2)} = 1
|
||||
$$
|
||||
|
||||
where $\mu_1$ and $\mu_2$ are the two classes means.
|
||||
where $p(x | c_k)$ is the probability for point $x$ along the Fisher projection
|
||||
line of belonging to the class $k$. If the classes are bivariate Gaussian, as
|
||||
in the present case, then $p(x | c_k)$ is simply given by its projected normal
|
||||
distribution $\mathscr{G} (\hat{μ}, \hat{S})$. With a bit of math, the solution
|
||||
is then:
|
||||
|
||||
$$
|
||||
r = \frac{N_s}{N_n}
|
||||
t = \frac{b}{a} + \sqrt{\left( \frac{b}{a} \right)^2 - \frac{c}{a}}
|
||||
$$
|
||||
|
||||
cmpute S_b
|
||||
where:
|
||||
|
||||
$S_b = S_1 + S_2$
|
||||
- $a = \hat{S}_1^2 - \hat{S}_2^2$
|
||||
- $b = \hat{m}_2 \, \hat{S}_1^2 - \hat{M}_1 \, \hat{S}_2^2$
|
||||
- $c = \hat{M}_2^2 \, \hat{S}_1^2 - \hat{M}_1^2 \, \hat{S}_2^2
|
||||
- 2 \, \hat{S}_1^2 \, \hat{S}_2^2 \, \ln(\alpha)$
|
||||
- $\alpha = p(c_1) / p(c_2)$
|
||||
|
||||
The ratio of the prior probability $\alpha$ was computed as:
|
||||
|
||||
$$
|
||||
\mu_1 = (\mu_{1x}, \mu_{1y})
|
||||
\alpha = \frac{N_s}{N_n}
|
||||
$$
|
||||
|
||||
the matrix $S$ is inverted with the Cholesky method, since it is symmetrical
|
||||
and positive-definite.
|
||||
The projection of the points was accomplished by the use of the function
|
||||
`gsl_blas_ddot`, which computed a dot product between two vectors, which in
|
||||
this case were the weight vector and the position of the point to be projected.
|
||||
|
||||
$$
|
||||
diff = \mu_1 - \mu_2
|
||||
$$
|
||||
|
||||
product with the `gsl_blas_dgemv()` function provided by GSL.
|
||||
result normalised with gsl functions.`
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user