ex-7: went on writing the FLD
This commit is contained in:
parent
ceedd61f00
commit
a1b84f022d
@ -1,9 +1,9 @@
|
|||||||
# Exercise 7
|
# Exercise 7
|
||||||
|
|
||||||
## Generating points according to gaussian distributions
|
## Generating points according to Gaussian distributions {#sec:sampling}
|
||||||
|
|
||||||
The firts task of esercise 7 is to generate two sets of 2D points $(x, y)$
|
The first task of exercise 7 is to generate two sets of 2D points $(x, y)$
|
||||||
according to two bivariate gaussian distributions with parameters:
|
according to two bivariate Gaussian distributions with parameters:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\text{signal} \quad
|
\text{signal} \quad
|
||||||
@ -44,15 +44,15 @@ must then be implemented in order to assign each point to the right class
|
|||||||
|
|
||||||
## Fisher linear discriminant
|
## Fisher linear discriminant
|
||||||
|
|
||||||
### The theory
|
### The projection direction
|
||||||
|
|
||||||
The Fisher linear discriminant (FLD) is a linear classification model based on
|
The Fisher linear discriminant (FLD) is a linear classification model based on
|
||||||
dimensionality reduction. It allows to reduce this 2D classification problem
|
dimensionality reduction. It allows to reduce this 2D classification problem
|
||||||
into a one-dimensional decision surface.
|
into a one-dimensional decision surface.
|
||||||
|
|
||||||
Consider the case of two classes, (in this case the signal and the noise): the
|
Consider the case of two classes (in this case the signal and the noise): the
|
||||||
simplest representation of a linear discriminant is obtained by taking a linear
|
simplest representation of a linear discriminant is obtained by taking a linear
|
||||||
function of a sampled point 2D $x$ so that:
|
function of a sampled 2D point $x$ so that:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\hat{x} = w^T x
|
\hat{x} = w^T x
|
||||||
@ -60,15 +60,14 @@ $$
|
|||||||
|
|
||||||
where $w$ is the so-called 'weight vector'. An input point $x$ is commonly
|
where $w$ is the so-called 'weight vector'. An input point $x$ is commonly
|
||||||
assigned to the first class if $\hat{x} \geqslant w_{th}$ and to the second one
|
assigned to the first class if $\hat{x} \geqslant w_{th}$ and to the second one
|
||||||
otherwise, where $w_{th}$ is a threshold somehow defined.
|
otherwise, where $w_{th}$ is a threshold value somehow defined.
|
||||||
In general, the projection onto one dimension leads to a considerable loss of
|
In general, the projection onto one dimension leads to a considerable loss of
|
||||||
information and classes that are well separated in the original 2D space may
|
information and classes that are well separated in the original 2D space may
|
||||||
become strongly overlapping in one dimension. However, by adjusting the
|
become strongly overlapping in one dimension. However, by adjusting the
|
||||||
components of the weight vector, a projection that maximizes the classes
|
components of the weight vector, a projection that maximizes the classes
|
||||||
separation can be selected.
|
separation can be selected.
|
||||||
To begin with, consider a two-classes problem in which there are $N_1$ points of
|
To begin with, consider $N_1$ points of class $C_1$ and $N_2$ points of class
|
||||||
class $C_1$ and $N_2$ points of class $C_2$, so that the means $n_1$ and $m_2$
|
$C_2$, so that the means $m_1$ and $m_2$ of the two classes are given by:
|
||||||
of the two classes are given by:
|
|
||||||
|
|
||||||
$$
|
$$
|
||||||
m_1 = \frac{1}{N_1} \sum_{n \in C_1} x_n
|
m_1 = \frac{1}{N_1} \sum_{n \in C_1} x_n
|
||||||
@ -77,29 +76,30 @@ $$
|
|||||||
$$
|
$$
|
||||||
|
|
||||||
The simplest measure of the separation of the classes is the separation of the
|
The simplest measure of the separation of the classes is the separation of the
|
||||||
projected class means. This suggests that to choose $w$ so as to maximize:
|
projected class means. This suggests to choose $w$ so as to maximize:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\hat{m}_2 − \hat{m}_1 = w^T (m_2 − m_1)
|
\hat{m}_2 − \hat{m}_1 = w^T (m_2 − m_1)
|
||||||
$$
|
$$
|
||||||
|
|
||||||
|
This expression can be made arbitrarily large simply by increasing the magnitude
|
||||||
|
of $w$. To solve this problem, $w$ can be constrained to have unit length, so
|
||||||
|
that $| w^2 | = 1$. Using a Lagrange multiplier to perform the constrained
|
||||||
|
maximization, it can be found that $w \propto (m_2 − m_1)$.
|
||||||
|
|
||||||
![The plot on the left shows samples from two classes along with the histograms
|
![The plot on the left shows samples from two classes along with the histograms
|
||||||
resulting from projection onto the line joining the class means: note that
|
resulting from projection onto the line joining the class means: note that
|
||||||
there is considerable overlap in the projected space. The right plot shows the
|
there is considerable overlap in the projected space. The right plot shows the
|
||||||
corresponding projection based on the Fisher linear discriminant, showing the
|
corresponding projection based on the Fisher linear discriminant, showing the
|
||||||
greatly improved classes separation.](images/fisher.png){#fig:overlap}
|
greatly improved classes separation.](images/fisher.png){#fig:overlap}
|
||||||
|
|
||||||
This expression can be made arbitrarily large simply by increasing the magnitude
|
|
||||||
of $w$. To solve this problem, $w$ can be costrained to have unit length, so
|
|
||||||
that $| w^2 | = 1$. Using a Lagrange multiplier to perform the constrained
|
|
||||||
maximization, it can be find that $w \propto (m_2 − m_1)$.
|
|
||||||
There is still a problem with this approach, however, as illustrated in
|
There is still a problem with this approach, however, as illustrated in
|
||||||
@fig:overlap: the two classes are well separated in the original 2D space but
|
@fig:overlap: the two classes are well separated in the original 2D space but
|
||||||
have considerable overlap when projected onto the line joining their means.
|
have considerable overlap when projected onto the line joining their means.
|
||||||
The idea to solve it is to maximize a function that will give a large separation
|
The idea to solve it is to maximize a function that will give a large separation
|
||||||
between the projected classes means while also giving a small variance within
|
between the projected classes means while also giving a small variance within
|
||||||
each class, thereby minimizing the class overlap.
|
each class, thereby minimizing the class overlap.
|
||||||
The within-classes variance of the transformed data of each $k$ class is given
|
The within-classes variance of the transformed data of each class $k$ is given
|
||||||
by:
|
by:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
@ -107,9 +107,9 @@ $$
|
|||||||
$$
|
$$
|
||||||
|
|
||||||
The total within-classes variance for the whole data set can be simply defined
|
The total within-classes variance for the whole data set can be simply defined
|
||||||
as $s^2 = s_1^2 + s_2^2$. The Fisher criterion is derefore defined to be the
|
as $s^2 = s_1^2 + s_2^2$. The Fisher criterion is therefore defined to be the
|
||||||
ratio of the between-classes distance to the within-class variance and is given
|
ratio of the between-classes distance to the within-classes variance and is
|
||||||
by:
|
given by:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
J(w) = \frac{(\hat{m}_2 - \hat{m}_1)^2}{s^2}
|
J(w) = \frac{(\hat{m}_2 - \hat{m}_1)^2}{s^2}
|
||||||
@ -122,7 +122,7 @@ $$
|
|||||||
w = S_b^{-1} (m_2 - m_1)
|
w = S_b^{-1} (m_2 - m_1)
|
||||||
$$
|
$$
|
||||||
|
|
||||||
where $S_b$ is the within-classes covariance matrix, given by:
|
where $S_b$ is the covariance matrix, given by:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
S_b = S_1 + S_2
|
S_b = S_1 + S_2
|
||||||
@ -142,35 +142,51 @@ projection of the data down to one dimension: the projected data can then be
|
|||||||
used to construct a discriminant by choosing a threshold for the
|
used to construct a discriminant by choosing a threshold for the
|
||||||
classification.
|
classification.
|
||||||
|
|
||||||
### The code
|
When implemented, the parameters given in @sec:sampling were used to compute
|
||||||
|
the covariance matrices $S_1$ and $S_2$ of the two classes and their sum $S$.
|
||||||
|
Then $S$, being a symmetrical and positive-definite matrix, was inverted with
|
||||||
|
the Cholesky method, already discussed in @sec:MLM.
|
||||||
|
Lastly, the matrix-vector product was computed with the `gsl_blas_dgemv()`
|
||||||
|
function provided by GSL.
|
||||||
|
|
||||||
As stated above, the projection vector is given by
|
### The threshold
|
||||||
|
|
||||||
|
The cut was fixed by the condition of conditional probability being the same
|
||||||
|
for each class:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
x = S_b^{-1} (\mu_1 - \mu_2)
|
t_{\text{cut}} = x \, | \hspace{20pt}
|
||||||
|
\frac{P(c_1 | x)}{P(c_2 | x)} =
|
||||||
|
\frac{p(x | c_1) \, p(c_1)}{p(x | c_1) \, p(c_2)} = 1
|
||||||
$$
|
$$
|
||||||
|
|
||||||
where $\mu_1$ and $\mu_2$ are the two classes means.
|
where $p(x | c_k)$ is the probability for point $x$ along the Fisher projection
|
||||||
|
line of belonging to the class $k$. If the classes are bivariate Gaussian, as
|
||||||
|
in the present case, then $p(x | c_k)$ is simply given by its projected normal
|
||||||
|
distribution $\mathscr{G} (\hat{μ}, \hat{S})$. With a bit of math, the solution
|
||||||
|
is then:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
r = \frac{N_s}{N_n}
|
t = \frac{b}{a} + \sqrt{\left( \frac{b}{a} \right)^2 - \frac{c}{a}}
|
||||||
$$
|
$$
|
||||||
|
|
||||||
cmpute S_b
|
where:
|
||||||
|
|
||||||
$S_b = S_1 + S_2$
|
- $a = \hat{S}_1^2 - \hat{S}_2^2$
|
||||||
|
- $b = \hat{m}_2 \, \hat{S}_1^2 - \hat{M}_1 \, \hat{S}_2^2$
|
||||||
|
- $c = \hat{M}_2^2 \, \hat{S}_1^2 - \hat{M}_1^2 \, \hat{S}_2^2
|
||||||
|
- 2 \, \hat{S}_1^2 \, \hat{S}_2^2 \, \ln(\alpha)$
|
||||||
|
- $\alpha = p(c_1) / p(c_2)$
|
||||||
|
|
||||||
|
The ratio of the prior probability $\alpha$ was computed as:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\mu_1 = (\mu_{1x}, \mu_{1y})
|
\alpha = \frac{N_s}{N_n}
|
||||||
$$
|
$$
|
||||||
|
|
||||||
the matrix $S$ is inverted with the Cholesky method, since it is symmetrical
|
The projection of the points was accomplished by the use of the function
|
||||||
and positive-definite.
|
`gsl_blas_ddot`, which computed a dot product between two vectors, which in
|
||||||
|
this case were the weight vector and the position of the point to be projected.
|
||||||
|
|
||||||
$$
|
|
||||||
diff = \mu_1 - \mu_2
|
|
||||||
$$
|
|
||||||
|
|
||||||
product with the `gsl_blas_dgemv()` function provided by GSL.
|
|
||||||
result normalised with gsl functions.`
|
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user