Interpreting Categorical Crossentropy Loss

Question

I would like to ask for clarification about the loss values outputted during training using Categorical Crossentropy as the loss function. If I have 11 categories, and my loss is (for the sake of the argument) 2, does this mean that my model is on average 2 categories off the correct category, or is the loss used purely for comparative purposes and cannot be interpreted like I am suggesting ?

Matthew · Answer

Cross-entropy is an information-theoretic measure about probability distributions, and it's measured in units that are determined by the base of the logarithm used in its computation (nats for the natural logarithm or bits for $log_2$). There are already several posts about intuitive understanding of cross entropy and its relationship to KL divergence (which are good to understand).
I think the more interesting part of your question is what "2 categories off" means with respect to your problem. Generally in classification optimized using categorical cross entropy, the classes are pairwise orthogonal: the content in the image is of a dog or a cat or a ship or whatever, so the ground-truth vector is always one-hot (1 in the index of the correct class, 0 elsewhere). Your question implicitly places an ordering over the classes: for something to be a number of "categories off", there must be some sense of similarity that makes categories $i$ and $j$ more alike than $i$ and $k$, say. If that's the case, you could look into alternative loss functions (but probably don't have to, depending on what you're doing).

Interpreting Categorical Crossentropy Loss

One Answer

Add your own answers!

Ask a Question