ex-7: started writing about the Fisher discriminant
This commit is contained in:
parent
4301409842
commit
e19ebc7fd1
107
notes/sections/7.md
Normal file
107
notes/sections/7.md
Normal file
@ -0,0 +1,107 @@
|
|||||||
|
# Exercise 7
|
||||||
|
|
||||||
|
## Generating points according to gaussian distributions
|
||||||
|
|
||||||
|
The firts task of esercise 7 is to generate two sets of 2D points $(x, y)$
|
||||||
|
according to two bivariate gaussian distributions with parameters:
|
||||||
|
|
||||||
|
$$
|
||||||
|
\text{signal} \quad
|
||||||
|
\begin{cases}
|
||||||
|
\mu = (0, 0) \\
|
||||||
|
\sigma_x = \sigma_y = 0.3 \\
|
||||||
|
\rho = 0.5
|
||||||
|
\end{cases}
|
||||||
|
\et
|
||||||
|
\text{noise} \quad
|
||||||
|
\begin{cases}
|
||||||
|
\mu = (4, 4) \\
|
||||||
|
\sigma_x = \sigma_y = 1 \\
|
||||||
|
\rho = 0.4
|
||||||
|
\end{cases}
|
||||||
|
$$
|
||||||
|
|
||||||
|
where $\mu$ stands for the mean, $\sigma_x$ and $\sigma_y$ stand for the
|
||||||
|
standard deviations in $x$ and $y$ directions respectively and $\rho$ is the
|
||||||
|
correlation.
|
||||||
|
In the code, default settings are $N_s = 800$ points for the signal and $n_n =
|
||||||
|
1000$ points for the noise but can be changed from the command-line. Both
|
||||||
|
samples were handled as matrices of dimension $n$ x 2, where $n$ is the number
|
||||||
|
of points in the sample. The library `gsl_matrix` provided by GSL was employed
|
||||||
|
for this purpose and the function `gsl_ran_bivariate_gaussian()` was used for
|
||||||
|
generating the points.
|
||||||
|
|
||||||
|
Then, a model of classification must be implemented in order to assign each
|
||||||
|
point to the right class (signal or noise) to which it 'most probably' belongs
|
||||||
|
to. The point is how 'most probably' can be interpreted and implemented.
|
||||||
|
|
||||||
|
## Fisher linear discriminant
|
||||||
|
|
||||||
|
The Fisher linear discriminant (FLD) is a linear classification model based on
|
||||||
|
dimensionality reduction. It allows to reduce this 2D classification problem
|
||||||
|
into a one-dimensional decision surface.
|
||||||
|
|
||||||
|
Consider the case of two classes, (in this case the signal and the noise): the
|
||||||
|
simplest representation of a linear discriminant is obtained by taking a linear
|
||||||
|
function of a sampled point 2D $x$ so that:
|
||||||
|
|
||||||
|
$$
|
||||||
|
\hat{x} = w x + w_0
|
||||||
|
$$
|
||||||
|
|
||||||
|
where $w$ is called 'weight vector' and $w_0$ is a bias. The negative of the
|
||||||
|
bias is called 'threshold'. An input point $x$ is assigned to the first class
|
||||||
|
if $\hat{x} \geqslant 0$ and to the second one otherwise.
|
||||||
|
In general, the projection onto one dimension leads to a considerable loss of
|
||||||
|
information and classes that are well separated in the original 2D space may
|
||||||
|
become strongly overlapping in one dimension. However, by adjusting the
|
||||||
|
components of the weight vector, a projection that maximizes the classes
|
||||||
|
separation can be selected.
|
||||||
|
To begin with, consider a two-classes problem in which there are $N_1$ points of
|
||||||
|
class $C_1$ and $N_2$ points of class $C_2$, so that the means $n_1$ and $m_2$
|
||||||
|
of the two classes are given by:
|
||||||
|
|
||||||
|
$$
|
||||||
|
m_1 = \frac{1}{N_1} \sum_{n \in C_1} x_n
|
||||||
|
\et
|
||||||
|
m_2 = \frac{1}{N_2} \sum_{n \in C_2} x_n
|
||||||
|
$$
|
||||||
|
|
||||||
|
The simplest measure of the separation of the classes is the separation of the
|
||||||
|
projected class means. This suggests that to choose $w$ so as to maximize:
|
||||||
|
|
||||||
|
$$
|
||||||
|
\hat{m}_2 − \hat{m}_1 = w (m_2 − m_1)
|
||||||
|
$$
|
||||||
|
|
||||||
|
![The plot on the left shows samples from two classes along with the histograms
|
||||||
|
resulting from projection onto the line joining the class means: note that
|
||||||
|
there is considerable overlap in the projected space. The right plot shows the
|
||||||
|
corresponding projection based on the Fisher linear discriminant, showing the
|
||||||
|
greatly improved classes separation.](images/fisher.png){#fig:overlap}
|
||||||
|
|
||||||
|
This expression can be made arbitrarily large simply by increasing the magnitude
|
||||||
|
of $w$. To solve this problem, $w$ can be costrained to have unit length, so
|
||||||
|
that $| w^2 | = 1$. Using a Lagrange multiplier to perform the constrained
|
||||||
|
maximization, it can be find that $w \propto (m_2 − m_1)$.
|
||||||
|
There is still a problem with this approach, however, as illustrated in
|
||||||
|
@fig:overlap: the two classes are well separated in the original 2D space but
|
||||||
|
have considerable overlap when projected onto the line joining their means.
|
||||||
|
The idea to solve it is to maximize a function that will give a large separation
|
||||||
|
between the projected classes means while also giving a small variance within
|
||||||
|
each class, thereby minimizing the class overlap.
|
||||||
|
The within-classes variance of the transformed data of each $k$ class is given
|
||||||
|
by:
|
||||||
|
|
||||||
|
$$
|
||||||
|
s_k^2 = \sum_{n \in C_k} (\hat{x}_n - \hat{m}_k)^2
|
||||||
|
$$
|
||||||
|
|
||||||
|
The total within-classes variance for the whole data set can be simply defined
|
||||||
|
as $s^2 = s_1^2 + s_2^2$. The Fisher criterion is derefore defined to be the
|
||||||
|
ratio of the between-classes distance to the within-class variance and is given
|
||||||
|
by:
|
||||||
|
|
||||||
|
$$
|
||||||
|
J(w) = \frac{(\hat{m}_2 - \hat{m}_1)^2}{s^2}
|
||||||
|
$$
|
Loading…
Reference in New Issue
Block a user