diff --git a/notes/sections/7.md b/notes/sections/7.md index b6ef97b..5ebb290 100644 --- a/notes/sections/7.md +++ b/notes/sections/7.md @@ -36,11 +36,11 @@ samples were handled as matrices of dimension $n$ x 2, where $n$ is the number of points in the sample. The library `gsl_matrix` provided by GSL was employed for this purpose and the function `gsl_ran_bivariate_gaussian()` was used for generating the points. -An example of the two samples is shown in @fig:fisher_points. +An example of the two samples is shown in @fig:points. {#fig:fisher_points} +in yellow.](images/points.pdf){#fig:points} Assuming not to know how the points were generated, a model of classification must then be implemented in order to assign each point to the right class @@ -154,6 +154,7 @@ the Cholesky method, already discussed in @sec:MLM. Lastly, the matrix-vector product was computed with the `gsl_blas_dgemv()` function provided by GSL. + ### The threshold The cut was fixed by the condition of conditional probability being the same @@ -198,7 +199,8 @@ this case were the weight vector and the position of the point to be projected. {height=5.7cm} -Aeral and lateral views of the projection direction, in blue, and the cut, in red. +Aeral and lateral views of the projection direction, in blue, and the cut, in +red. Results obtained for the same sample in @fig:fisher_points are shown in @@ -212,3 +214,78 @@ and $t_{\text{cut}}$ is 1.323 far from the origin of the axes. Hence, as can be seen, the vector $w$ turned out to be parallel to the line joining the means of the two classes (reminded to be $(0, 0)$ and $(4, 4)$) which means that the total covariance matrix $S$ is isotropic, proportional to the unit matrix. + + +## Perceptron + +In machine learning, the perceptron is an algorithm for supervised learning of +linear binary classifiers. +Supervised learning is the machine learning task of inferring a function $f$ +that maps an input $x$ to an output $f(x)$ based on a set of training +input-output pairs. Each example is a pair consisting of an input object and an +output value. The inferred function can be used for mapping new examples. The +algorithm will be generalized to correctly determine the class labels for unseen +instances. + +The aim is to determine the threshold function $f(x)$ for the dot product +between the (in this case 2D) vector point $x$ and the weight vector $w$: + +$$ + f(x) = x \cdot w + b +$$ {#eq:perc} + +where $b$ is called 'bias'. If $f(x) \geqslant 0$, than the point can be +assigned to the class $C_1$, to $C_2$ otherwise. + +The training was performed as follow. The idea is that the function $f(x)$ must +return 0 when the point $x$ belongs to the noise and 1 if it belongs to the +signal. Initial values were set as $w = (0,0)$ and $b = 0$. From these, the +perceptron starts to improve their estimations. The sample was passed point by +point into a reiterative procedure a grand total of $N_c$ calls: each time, the +projection $w \cdot x$ of the point was computed and then the variable $\Delta$ was defined as: + +$$ + \Delta = r * (e - \theta (f(x)) +$$ + +where: + + - $r$ is the learning rate of the perceptron: it is between 0 and 1. The + larger $r$, the more volatile the weight changes. In the code, it was set + $r = 0.8$; + - $e$ is the expected value, namely 0 if $x$ is noise and 1 if it is signal; + - $\theta$ is the Heavyside theta function; + - $o$ is the observed value of $f(x)$ defined in @eq:perc. + +Then $b$ and $w$ must be updated as: + +$$ + b \longrightarrow b + \Delta + \et + w \longrightarrow w + x \Delta +$$ + +