ex-7: Finished writing about perceptron

2020-04-06 23:16:56 +02:00 · 2020-04-06 23:16:56 +02:00 · 12fc0c406e
commit 12fc0c406e
parent 295b0ec625
1 changed files with 80 additions and 3 deletions
--- a/notes/sections/7.md
+++ b/notes/sections/7.md
@ -36,11 +36,11 @@ samples were handled as matrices of dimension $n$ x 2, where $n$ is the number
 of points in the sample. The library `gsl_matrix` provided by GSL was employed
 for this purpose and the function `gsl_ran_bivariate_gaussian()` was used for
 generating the points.  
-An example of the two samples is shown in @fig:fisher_points.
+An example of the two samples is shown in @fig:points.
 ![Example of points sorted according to two Gaussian with
 the given parameters. Noise points in pink and signal points
-in yellow.](images/fisher-points.pdf){#fig:fisher_points}
+in yellow.](images/points.pdf){#fig:points}
 Assuming not to know how the points were generated, a model of classification
 must then be implemented in order to assign each point to the right class
@ -154,6 +154,7 @@ the Cholesky method, already discussed in @sec:MLM.
 Lastly, the matrix-vector product was computed with the `gsl_blas_dgemv()`
 function provided by GSL.
 ### The threshold
 The cut was fixed by the condition of conditional probability being the same
@ -198,7 +199,8 @@ this case were the weight vector and the position of the point to be projected.
 ![Gaussian of the samples on the projection
  line.](images/fisher-proj.pdf){height=5.7cm}
-Aeral and lateral views of the projection direction, in blue, and the cut, in red.
+Aeral and lateral views of the projection direction, in blue, and the cut, in
 red.
 </div>
 Results obtained for the same sample in @fig:fisher_points are shown in
@ -212,3 +214,78 @@ and $t_{\text{cut}}$ is 1.323 far from the origin of the axes. Hence, as can be
 seen, the vector $w$ turned out to be parallel to the line joining the means of
 the two classes (reminded to be $(0, 0)$ and $(4, 4)$) which means that the 
 total covariance matrix $S$ is isotropic, proportional to the unit matrix.
 ## Perceptron
 In machine learning, the perceptron is an algorithm for supervised learning of
 linear binary classifiers.  
 Supervised learning is the machine learning task of inferring a function $f$
 that maps an input $x$ to an output $f(x)$ based on a set of training
 input-output pairs. Each example is a pair consisting of an input object and an
 output value. The inferred function can be used for mapping new examples. The
 algorithm will be generalized to correctly determine the class labels for unseen
 instances.
 The aim is to determine the threshold function $f(x)$ for the dot product
 between the (in this case 2D) vector point $x$ and the weight vector $w$:
 $$
  f(x) = x \cdot w + b
 $$ {#eq:perc}
 where $b$ is called 'bias'. If $f(x) \geqslant 0$, than the point can be
 assigned to the class $C_1$, to $C_2$ otherwise.
 The training was performed as follow. The idea is that the function $f(x)$ must
 return 0 when the point $x$ belongs to the noise and 1 if it belongs to the 
 signal. Initial values were set as $w = (0,0)$ and $b = 0$. From  these, the
 perceptron starts to improve their estimations. The sample was passed point by
 point into a reiterative procedure a grand total of $N_c$ calls: each time, the
 projection $w \cdot x$ of the point was computed and then the variable $\Delta$ was defined as:
 $$
  \Delta = r * (e - \theta (f(x))
 $$
 where:
  - $r$ is the learning rate of the perceptron: it is between 0 and 1. The
    larger $r$, the more volatile the weight changes. In the code, it was set
    $r = 0.8$;
  - $e$ is the expected value, namely 0 if $x$ is noise and 1 if it is signal;
  - $\theta$ is the Heavyside theta function;
  - $o$ is the observed value of $f(x)$ defined in @eq:perc.
 Then $b$ and $w$ must be updated as:
 $$
  b \longrightarrow b + \Delta
  \et
  w \longrightarrow w + x \Delta
 $$
 <div id="fig:percep_proj">
 ![View from above of the samples.](images/percep-plane.pdf){height=5.7cm}
 ![Gaussian of the samples on the projection
  line.](images/percep-proj.pdf){height=5.7cm}
 Aeral and lateral views of the projection direction, in blue, and the cut, in
 red.
 </div>
 It can be shown that this method converges to the coveted function.  
 As stated in the previous section, the weight vector must finally be normalzied.
 With $N_c = 5$, the values of $w$ and $t_{\text{cut}}$ level off up to the third
 digit. The following results were obtained:
 $$
  w = (0.654, 0.756) \et t_{\text{cut}} = 1.213
 $$
 where, once again, $t_{\text{cut}}$ is computed from the origin of the axes. In
 this case, the projection line does not lies along the mains of the two
 samples. Plots in @fig:percep_proj.
 ## Efficiency test