ex-7: went on writing the FLD
This commit is contained in:
parent
3ee0aec13e
commit
37e5bf0cbb
@ -21,22 +21,31 @@ $$
|
||||
\end{cases}
|
||||
$$
|
||||
|
||||
where $\mu$ stands for the mean, $\sigma_x$ and $\sigma_y$ stand for the
|
||||
standard deviations in $x$ and $y$ directions respectively and $\rho$ is the
|
||||
correlation.
|
||||
In the code, default settings are $N_s = 800$ points for the signal and $n_n =
|
||||
where $\mu$ stands for the mean, $\sigma_x$ and $\sigma_y$ are the standard
|
||||
deviations in $x$ and $y$ directions respectively and $\rho$ is the bivariate
|
||||
correlation, hence:
|
||||
|
||||
$$
|
||||
\sigma_{xy} = \rho \sigma_x \sigma_y
|
||||
$$
|
||||
|
||||
where $\sigma_{xy}$ is the covariance of $x$ and $y$.
|
||||
In the code, default settings are $N_s = 800$ points for the signal and $N_n =
|
||||
1000$ points for the noise but can be changed from the command-line. Both
|
||||
samples were handled as matrices of dimension $n$ x 2, where $n$ is the number
|
||||
of points in the sample. The library `gsl_matrix` provided by GSL was employed
|
||||
for this purpose and the function `gsl_ran_bivariate_gaussian()` was used for
|
||||
generating the points.
|
||||
|
||||
Then, a model of classification must be implemented in order to assign each
|
||||
point to the right class (signal or noise) to which it 'most probably' belongs
|
||||
to. The point is how 'most probably' can be interpreted and implemented.
|
||||
Assuming not to know how the points were generated, a model of classification
|
||||
must then be implemented in order to assign each point to the right class
|
||||
(signal or noise) to which it 'most probably' belongs to. The point is how
|
||||
'most probably' can be interpreted and implemented.
|
||||
|
||||
## Fisher linear discriminant
|
||||
|
||||
### The theory
|
||||
|
||||
The Fisher linear discriminant (FLD) is a linear classification model based on
|
||||
dimensionality reduction. It allows to reduce this 2D classification problem
|
||||
into a one-dimensional decision surface.
|
||||
@ -46,12 +55,12 @@ simplest representation of a linear discriminant is obtained by taking a linear
|
||||
function of a sampled point 2D $x$ so that:
|
||||
|
||||
$$
|
||||
\hat{x} = w x + w_0
|
||||
\hat{x} = w^T x
|
||||
$$
|
||||
|
||||
where $w$ is called 'weight vector' and $w_0$ is a bias. The negative of the
|
||||
bias is called 'threshold'. An input point $x$ is assigned to the first class
|
||||
if $\hat{x} \geqslant 0$ and to the second one otherwise.
|
||||
where $w$ is the so-called 'weight vector'. An input point $x$ is commonly
|
||||
assigned to the first class if $\hat{x} \geqslant w_{th}$ and to the second one
|
||||
otherwise, where $w_{th}$ is a threshold somehow defined.
|
||||
In general, the projection onto one dimension leads to a considerable loss of
|
||||
information and classes that are well separated in the original 2D space may
|
||||
become strongly overlapping in one dimension. However, by adjusting the
|
||||
@ -71,7 +80,7 @@ The simplest measure of the separation of the classes is the separation of the
|
||||
projected class means. This suggests that to choose $w$ so as to maximize:
|
||||
|
||||
$$
|
||||
\hat{m}_2 − \hat{m}_1 = w (m_2 − m_1)
|
||||
\hat{m}_2 − \hat{m}_1 = w^T (m_2 − m_1)
|
||||
$$
|
||||
|
||||
![The plot on the left shows samples from two classes along with the histograms
|
||||
@ -105,3 +114,63 @@ by:
|
||||
$$
|
||||
J(w) = \frac{(\hat{m}_2 - \hat{m}_1)^2}{s^2}
|
||||
$$
|
||||
|
||||
Differentiating $J(w)$ with respect to $w$, it can be found that it is
|
||||
maximized when:
|
||||
|
||||
$$
|
||||
w = S_b^{-1} (m_2 - m_1)
|
||||
$$
|
||||
|
||||
where $S_b$ is the within-classes covariance matrix, given by:
|
||||
|
||||
$$
|
||||
S_b = S_1 + S_2
|
||||
$$
|
||||
|
||||
where $S_1$ and $S_2$ are the covariance matrix of the two classes, namely:
|
||||
|
||||
$$
|
||||
\begin{pmatrix}
|
||||
\sigma_x^2 & \sigma_{xy} \\
|
||||
\sigma_{xy} & \sigma_y^2
|
||||
\end{pmatrix}
|
||||
$$
|
||||
|
||||
This is not truly a discriminant but rather a specific choice of direction for
|
||||
projection of the data down to one dimension: the projected data can then be
|
||||
used to construct a discriminant by choosing a threshold for the
|
||||
classification.
|
||||
|
||||
### The code
|
||||
|
||||
As stated above, the projection vector is given by
|
||||
|
||||
$$
|
||||
x = S_b^{-1} (\mu_1 - \mu_2)
|
||||
$$
|
||||
|
||||
where $\mu_1$ and $\mu_2$ are the two classes means.
|
||||
|
||||
$$
|
||||
r = \frac{N_s}{N_n}
|
||||
$$
|
||||
|
||||
cmpute S_b
|
||||
|
||||
$S_b = S_1 + S_2$
|
||||
|
||||
$$
|
||||
\mu_1 = (\mu_{1x}, \mu_{1y})
|
||||
$$
|
||||
|
||||
the matrix $S$ is inverted with the Cholesky method, since it is symmetrical
|
||||
and positive-definite.
|
||||
|
||||
$$
|
||||
diff = \mu_1 - \mu_2
|
||||
$$
|
||||
|
||||
product with the `gsl_blas_dgemv()` function provided by GSL.
|
||||
result normalised with gsl functions.`
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user