Data Science Asked by Continue2Learn on January 27, 2021
Pls. refer section 4.1.3 in Pattern Recognition – Bishop: "Least squares for Classification":
In a 2 class Linear Discriminat system, we classified vector $mathbf{x}$ as $mathcal{C}_1$ if y($bf{x}$)>0, and $mathcal{C}_2$ otherwise.
Generalizing in section 4.1.3, we define $mathcal{K}$ linear discriminant equations – one for each class:
$y_{k}(bf{x}) = bf{w_k^Tx} + mathit{w_{k0}} tag {4.13}$
adding a leading 1 to vector $bf{x}$ yields $tilde{mathbf{x}}$.
And the Linear Discriminant function for $mathcal{K}$ class is given by:
$bf y(x) = widetilde{W}^{T}tilde{x}$. The author progresses and presents sum of squares Error function as:
$E_D(widetilde{W}) = frac{1}{2}Tr{(tilde{X}widetilde{W} – T)^T(tilde{X}widetilde{W} – T)} tag {4.15}$
My doubts are related to above equation 4.15.
Consider a 3-class system with only one observation, $bf{x}in mathcal{C_2}$, my understanding:
- Pls. refer $bf{Y}$ in upper half of diagram. Will only $val(mathcal{C_2})$ be positive: $mathbf{x} in mathcal{C_2}$,
$y_{2}(bf{x})$ > $it{threshold}(mathcal{C_2})$. Is the value,
$val(mathcal{C_k})$, negative for other classes’ Discriminant
functions? If not, could you briefly explain the reason?- The error matrix, $bf{E}$ is 1×3 matrix. $bf{E}^{T}E$ will be a 3×3 matrix, with diagonal elements representing squared(Error) for a
class. Does $Tr$ in 4.15 stand for $trace$ – sum of diagonal elements?
If so, why do we ignore off diagonal error values/ why don’t they
matter?
P.S.: If my understanding is wrong/ grossly wrong, I’ll appreciate if you point out the same.
As Bishop points out throughout that section, least squares is ill-equipped for this problem, so maybe we shouldn't spend too much time understanding it. On the other hand, clearing up misconceptions here may help elsewhere.
Answered by Ben Reiniger on January 27, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP