ex-7: started writing about the Fisher discriminant
This commit is contained in:
parent
4301409842
commit
e19ebc7fd1
107
notes/sections/7.md
Normal file
107
notes/sections/7.md
Normal file
@ -0,0 +1,107 @@
|
||||
# Exercise 7
|
||||
|
||||
## Generating points according to gaussian distributions
|
||||
|
||||
The firts task of esercise 7 is to generate two sets of 2D points $(x, y)$
|
||||
according to two bivariate gaussian distributions with parameters:
|
||||
|
||||
$$
|
||||
\text{signal} \quad
|
||||
\begin{cases}
|
||||
\mu = (0, 0) \\
|
||||
\sigma_x = \sigma_y = 0.3 \\
|
||||
\rho = 0.5
|
||||
\end{cases}
|
||||
\et
|
||||
\text{noise} \quad
|
||||
\begin{cases}
|
||||
\mu = (4, 4) \\
|
||||
\sigma_x = \sigma_y = 1 \\
|
||||
\rho = 0.4
|
||||
\end{cases}
|
||||
$$
|
||||
|
||||
where $\mu$ stands for the mean, $\sigma_x$ and $\sigma_y$ stand for the
|
||||
standard deviations in $x$ and $y$ directions respectively and $\rho$ is the
|
||||
correlation.
|
||||
In the code, default settings are $N_s = 800$ points for the signal and $n_n =
|
||||
1000$ points for the noise but can be changed from the command-line. Both
|
||||
samples were handled as matrices of dimension $n$ x 2, where $n$ is the number
|
||||
of points in the sample. The library `gsl_matrix` provided by GSL was employed
|
||||
for this purpose and the function `gsl_ran_bivariate_gaussian()` was used for
|
||||
generating the points.
|
||||
|
||||
Then, a model of classification must be implemented in order to assign each
|
||||
point to the right class (signal or noise) to which it 'most probably' belongs
|
||||
to. The point is how 'most probably' can be interpreted and implemented.
|
||||
|
||||
## Fisher linear discriminant
|
||||
|
||||
The Fisher linear discriminant (FLD) is a linear classification model based on
|
||||
dimensionality reduction. It allows to reduce this 2D classification problem
|
||||
into a one-dimensional decision surface.
|
||||
|
||||
Consider the case of two classes, (in this case the signal and the noise): the
|
||||
simplest representation of a linear discriminant is obtained by taking a linear
|
||||
function of a sampled point 2D $x$ so that:
|
||||
|
||||
$$
|
||||
\hat{x} = w x + w_0
|
||||
$$
|
||||
|
||||
where $w$ is called 'weight vector' and $w_0$ is a bias. The negative of the
|
||||
bias is called 'threshold'. An input point $x$ is assigned to the first class
|
||||
if $\hat{x} \geqslant 0$ and to the second one otherwise.
|
||||
In general, the projection onto one dimension leads to a considerable loss of
|
||||
information and classes that are well separated in the original 2D space may
|
||||
become strongly overlapping in one dimension. However, by adjusting the
|
||||
components of the weight vector, a projection that maximizes the classes
|
||||
separation can be selected.
|
||||
To begin with, consider a two-classes problem in which there are $N_1$ points of
|
||||
class $C_1$ and $N_2$ points of class $C_2$, so that the means $n_1$ and $m_2$
|
||||
of the two classes are given by:
|
||||
|
||||
$$
|
||||
m_1 = \frac{1}{N_1} \sum_{n \in C_1} x_n
|
||||
\et
|
||||
m_2 = \frac{1}{N_2} \sum_{n \in C_2} x_n
|
||||
$$
|
||||
|
||||
The simplest measure of the separation of the classes is the separation of the
|
||||
projected class means. This suggests that to choose $w$ so as to maximize:
|
||||
|
||||
$$
|
||||
\hat{m}_2 − \hat{m}_1 = w (m_2 − m_1)
|
||||
$$
|
||||
|
||||
![The plot on the left shows samples from two classes along with the histograms
|
||||
resulting from projection onto the line joining the class means: note that
|
||||
there is considerable overlap in the projected space. The right plot shows the
|
||||
corresponding projection based on the Fisher linear discriminant, showing the
|
||||
greatly improved classes separation.](images/fisher.png){#fig:overlap}
|
||||
|
||||
This expression can be made arbitrarily large simply by increasing the magnitude
|
||||
of $w$. To solve this problem, $w$ can be costrained to have unit length, so
|
||||
that $| w^2 | = 1$. Using a Lagrange multiplier to perform the constrained
|
||||
maximization, it can be find that $w \propto (m_2 − m_1)$.
|
||||
There is still a problem with this approach, however, as illustrated in
|
||||
@fig:overlap: the two classes are well separated in the original 2D space but
|
||||
have considerable overlap when projected onto the line joining their means.
|
||||
The idea to solve it is to maximize a function that will give a large separation
|
||||
between the projected classes means while also giving a small variance within
|
||||
each class, thereby minimizing the class overlap.
|
||||
The within-classes variance of the transformed data of each $k$ class is given
|
||||
by:
|
||||
|
||||
$$
|
||||
s_k^2 = \sum_{n \in C_k} (\hat{x}_n - \hat{m}_k)^2
|
||||
$$
|
||||
|
||||
The total within-classes variance for the whole data set can be simply defined
|
||||
as $s^2 = s_1^2 + s_2^2$. The Fisher criterion is derefore defined to be the
|
||||
ratio of the between-classes distance to the within-class variance and is given
|
||||
by:
|
||||
|
||||
$$
|
||||
J(w) = \frac{(\hat{m}_2 - \hat{m}_1)^2}{s^2}
|
||||
$$
|
Loading…
Reference in New Issue
Block a user