ex-7: went on writing the FLD

2020-04-01 23:39:19 +02:00 · 2020-04-01 23:39:19 +02:00 · 37e5bf0cbb
commit 37e5bf0cbb
parent 3ee0aec13e
1 changed files with 81 additions and 12 deletions
--- a/notes/sections/7.md
+++ b/notes/sections/7.md
@ -21,22 +21,31 @@ $$
 \end{cases}
 $$
-where $\mu$ stands for the mean, $\sigma_x$ and $\sigma_y$ stand for the
+where $\mu$ stands for the mean, $\sigma_x$ and $\sigma_y$ are the standard
-standard deviations in $x$ and $y$ directions respectively and $\rho$ is the
+deviations in $x$ and $y$ directions respectively and $\rho$ is the bivariate
-correlation.  
+correlation, hence:
-In the code, default settings are $N_s = 800$ points for the signal and $n_n =
+
 $$
  \sigma_{xy} = \rho \sigma_x \sigma_y
 $$
 where $\sigma_{xy}$ is the covariance of $x$ and $y$.  
 In the code, default settings are $N_s = 800$ points for the signal and $N_n =
 1000$ points for the noise but can be changed from the command-line. Both
 samples were handled as matrices of dimension $n$ x 2, where $n$ is the number
 of points in the sample. The library `gsl_matrix` provided by GSL was employed
 for this purpose and the function `gsl_ran_bivariate_gaussian()` was used for
 generating the points.
-Then, a model of classification must be implemented in order to assign each
+Assuming not to know how the points were generated, a model of classification
-point to the right class (signal or noise) to which it 'most probably' belongs
+must then be implemented in order to assign each point to the right class
-to. The point is how 'most probably' can be interpreted and implemented.
+(signal or noise) to which it 'most probably' belongs to. The point is how
 'most probably' can be interpreted and implemented.
 ## Fisher linear discriminant
 ### The theory
 The Fisher linear discriminant (FLD) is a linear classification model based on
 dimensionality reduction. It allows to reduce this 2D classification problem
 into a one-dimensional decision surface.
@ -46,12 +55,12 @@ simplest representation of a linear discriminant is obtained by taking a linear
 function of a sampled point 2D $x$ so that:
 $$
-  \hat{x} = w x + w_0
+  \hat{x} = w^T x
 $$
-where $w$ is called 'weight vector' and $w_0$ is a bias. The negative of the
+where $w$ is the so-called 'weight vector'. An input point $x$ is commonly
-bias is called 'threshold'. An input point $x$ is assigned to the first class
+assigned to the first class if $\hat{x} \geqslant w_{th}$ and to the second one
-if $\hat{x} \geqslant 0$ and to the second one otherwise.  
+otherwise, where $w_{th}$ is a threshold somehow defined.  
 In general, the projection onto one dimension leads to a considerable loss of
 information and classes that are well separated in the original 2D space may
 become strongly overlapping in one dimension. However, by adjusting the
@ -71,7 +80,7 @@ The simplest measure of the separation of the classes is the separation of the
 projected class means. This suggests that to choose $w$ so as to maximize:
 $$
-  \hat{m}_2 − \hat{m}_1 = w (m_2 − m_1)
+  \hat{m}_2 − \hat{m}_1 = w^T (m_2 − m_1)
 $$
 ![The plot on the left shows samples from two classes along with the histograms
@ -105,3 +114,63 @@ by:
 $$
  J(w) = \frac{(\hat{m}_2 - \hat{m}_1)^2}{s^2}
 $$
 Differentiating $J(w)$ with respect to $w$, it can be found that it is
 maximized when:
 $$
  w = S_b^{-1} (m_2 - m_1)
 $$
 where $S_b$ is the within-classes covariance matrix, given by:
 $$
  S_b = S_1 + S_2
 $$
 where $S_1$ and $S_2$ are the covariance matrix of the two classes, namely:
 $$
 \begin{pmatrix}
 \sigma_x^2  & \sigma_{xy} \\
 \sigma_{xy} & \sigma_y^2
 \end{pmatrix}
 $$
 This is not truly a discriminant but rather a specific choice of direction for
 projection of the data down to one dimension: the projected data can then be
 used to construct a discriminant by choosing a threshold for the
 classification.
 ### The code
 As stated above, the projection vector is given by
 $$
  x = S_b^{-1} (\mu_1 - \mu_2)
 $$
 where $\mu_1$ and $\mu_2$ are the two classes means.
 $$
  r = \frac{N_s}{N_n}
 $$
 cmpute S_b
 $S_b = S_1 + S_2$
 $$
  \mu_1 = (\mu_{1x}, \mu_{1y})
 $$
 the matrix $S$ is inverted with the Cholesky method, since it is symmetrical
 and positive-definite.
 $$
  diff = \mu_1 - \mu_2
 $$
 product with the `gsl_blas_dgemv()` function provided by GSL.
 result normalised with gsl functions.`