How is calculated the error with multiple output neurons in neural network?

Question

Machine Learning books generally explains that the error calculated for a given sample $i$ is:
$e_i = y_i - hat{y_i}$
Where $hat{y}$ is the target output and $y$ is the actual output given by the network. So, a loss function $L$ is calculated:
$L = frac{1}{2N}sum^{N}_{i=1}(e_i)^2$
The above scenario is explained for a binary classification/regression problem. Now, let's assume a MLP network with $m$ neurons in the output layer for a multiclass classification problem (generally one neuron per class).
What does change in the equations above? Since we now have multiple outputs, both $e_i$ and $y_i$ should be a vector?

Mikedev · Answer

You are mixing various concepts:

$L = frac{1}{2N}sum^{N}_{i=1}(e_i)^2$ is used only for regression problem and not for binary classification because MSE fits very well when your target distribution is normal
You can use the latter formula for binary classification but will works really bad because your target data distribution is a Bernoulli, not Normal. Remember that the choice of the right imply a prior assumption on the target data distribution. For this reason the right formula is binary crossentropy (aka negative log likelihood of a Bernoulli) $$ L = - sum_i y_i log hat{y_i} (1 - y_i) log(1 - hat{y_i})   $$
For multi classification problem there is a generalized formula of binary crossentropy which is called categorical crossentropy. If $hat{y}$ is a vector of C element, one for each class and the true class $y$ is encoded as integer (e.g 0, 1, 2 ...) then the loss is $$ L = - sum_i log(hat{y_i}[y]) $$

How is calculated the error with multiple output neurons in neural network?

One Answer

Add your own answers!

Ask a Question