ex-7: went on writing the FLD
This commit is contained in:
parent
3ee0aec13e
commit
37e5bf0cbb
@ -21,22 +21,31 @@ $$
|
|||||||
\end{cases}
|
\end{cases}
|
||||||
$$
|
$$
|
||||||
|
|
||||||
where $\mu$ stands for the mean, $\sigma_x$ and $\sigma_y$ stand for the
|
where $\mu$ stands for the mean, $\sigma_x$ and $\sigma_y$ are the standard
|
||||||
standard deviations in $x$ and $y$ directions respectively and $\rho$ is the
|
deviations in $x$ and $y$ directions respectively and $\rho$ is the bivariate
|
||||||
correlation.
|
correlation, hence:
|
||||||
In the code, default settings are $N_s = 800$ points for the signal and $n_n =
|
|
||||||
|
$$
|
||||||
|
\sigma_{xy} = \rho \sigma_x \sigma_y
|
||||||
|
$$
|
||||||
|
|
||||||
|
where $\sigma_{xy}$ is the covariance of $x$ and $y$.
|
||||||
|
In the code, default settings are $N_s = 800$ points for the signal and $N_n =
|
||||||
1000$ points for the noise but can be changed from the command-line. Both
|
1000$ points for the noise but can be changed from the command-line. Both
|
||||||
samples were handled as matrices of dimension $n$ x 2, where $n$ is the number
|
samples were handled as matrices of dimension $n$ x 2, where $n$ is the number
|
||||||
of points in the sample. The library `gsl_matrix` provided by GSL was employed
|
of points in the sample. The library `gsl_matrix` provided by GSL was employed
|
||||||
for this purpose and the function `gsl_ran_bivariate_gaussian()` was used for
|
for this purpose and the function `gsl_ran_bivariate_gaussian()` was used for
|
||||||
generating the points.
|
generating the points.
|
||||||
|
|
||||||
Then, a model of classification must be implemented in order to assign each
|
Assuming not to know how the points were generated, a model of classification
|
||||||
point to the right class (signal or noise) to which it 'most probably' belongs
|
must then be implemented in order to assign each point to the right class
|
||||||
to. The point is how 'most probably' can be interpreted and implemented.
|
(signal or noise) to which it 'most probably' belongs to. The point is how
|
||||||
|
'most probably' can be interpreted and implemented.
|
||||||
|
|
||||||
## Fisher linear discriminant
|
## Fisher linear discriminant
|
||||||
|
|
||||||
|
### The theory
|
||||||
|
|
||||||
The Fisher linear discriminant (FLD) is a linear classification model based on
|
The Fisher linear discriminant (FLD) is a linear classification model based on
|
||||||
dimensionality reduction. It allows to reduce this 2D classification problem
|
dimensionality reduction. It allows to reduce this 2D classification problem
|
||||||
into a one-dimensional decision surface.
|
into a one-dimensional decision surface.
|
||||||
@ -46,12 +55,12 @@ simplest representation of a linear discriminant is obtained by taking a linear
|
|||||||
function of a sampled point 2D $x$ so that:
|
function of a sampled point 2D $x$ so that:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\hat{x} = w x + w_0
|
\hat{x} = w^T x
|
||||||
$$
|
$$
|
||||||
|
|
||||||
where $w$ is called 'weight vector' and $w_0$ is a bias. The negative of the
|
where $w$ is the so-called 'weight vector'. An input point $x$ is commonly
|
||||||
bias is called 'threshold'. An input point $x$ is assigned to the first class
|
assigned to the first class if $\hat{x} \geqslant w_{th}$ and to the second one
|
||||||
if $\hat{x} \geqslant 0$ and to the second one otherwise.
|
otherwise, where $w_{th}$ is a threshold somehow defined.
|
||||||
In general, the projection onto one dimension leads to a considerable loss of
|
In general, the projection onto one dimension leads to a considerable loss of
|
||||||
information and classes that are well separated in the original 2D space may
|
information and classes that are well separated in the original 2D space may
|
||||||
become strongly overlapping in one dimension. However, by adjusting the
|
become strongly overlapping in one dimension. However, by adjusting the
|
||||||
@ -71,7 +80,7 @@ The simplest measure of the separation of the classes is the separation of the
|
|||||||
projected class means. This suggests that to choose $w$ so as to maximize:
|
projected class means. This suggests that to choose $w$ so as to maximize:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\hat{m}_2 − \hat{m}_1 = w (m_2 − m_1)
|
\hat{m}_2 − \hat{m}_1 = w^T (m_2 − m_1)
|
||||||
$$
|
$$
|
||||||
|
|
||||||
![The plot on the left shows samples from two classes along with the histograms
|
![The plot on the left shows samples from two classes along with the histograms
|
||||||
@ -105,3 +114,63 @@ by:
|
|||||||
$$
|
$$
|
||||||
J(w) = \frac{(\hat{m}_2 - \hat{m}_1)^2}{s^2}
|
J(w) = \frac{(\hat{m}_2 - \hat{m}_1)^2}{s^2}
|
||||||
$$
|
$$
|
||||||
|
|
||||||
|
Differentiating $J(w)$ with respect to $w$, it can be found that it is
|
||||||
|
maximized when:
|
||||||
|
|
||||||
|
$$
|
||||||
|
w = S_b^{-1} (m_2 - m_1)
|
||||||
|
$$
|
||||||
|
|
||||||
|
where $S_b$ is the within-classes covariance matrix, given by:
|
||||||
|
|
||||||
|
$$
|
||||||
|
S_b = S_1 + S_2
|
||||||
|
$$
|
||||||
|
|
||||||
|
where $S_1$ and $S_2$ are the covariance matrix of the two classes, namely:
|
||||||
|
|
||||||
|
$$
|
||||||
|
\begin{pmatrix}
|
||||||
|
\sigma_x^2 & \sigma_{xy} \\
|
||||||
|
\sigma_{xy} & \sigma_y^2
|
||||||
|
\end{pmatrix}
|
||||||
|
$$
|
||||||
|
|
||||||
|
This is not truly a discriminant but rather a specific choice of direction for
|
||||||
|
projection of the data down to one dimension: the projected data can then be
|
||||||
|
used to construct a discriminant by choosing a threshold for the
|
||||||
|
classification.
|
||||||
|
|
||||||
|
### The code
|
||||||
|
|
||||||
|
As stated above, the projection vector is given by
|
||||||
|
|
||||||
|
$$
|
||||||
|
x = S_b^{-1} (\mu_1 - \mu_2)
|
||||||
|
$$
|
||||||
|
|
||||||
|
where $\mu_1$ and $\mu_2$ are the two classes means.
|
||||||
|
|
||||||
|
$$
|
||||||
|
r = \frac{N_s}{N_n}
|
||||||
|
$$
|
||||||
|
|
||||||
|
cmpute S_b
|
||||||
|
|
||||||
|
$S_b = S_1 + S_2$
|
||||||
|
|
||||||
|
$$
|
||||||
|
\mu_1 = (\mu_{1x}, \mu_{1y})
|
||||||
|
$$
|
||||||
|
|
||||||
|
the matrix $S$ is inverted with the Cholesky method, since it is symmetrical
|
||||||
|
and positive-definite.
|
||||||
|
|
||||||
|
$$
|
||||||
|
diff = \mu_1 - \mu_2
|
||||||
|
$$
|
||||||
|
|
||||||
|
product with the `gsl_blas_dgemv()` function provided by GSL.
|
||||||
|
result normalised with gsl functions.`
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user