Multi-Label Loss function and model training

Question

I'm working on Multi-Label problem i.e output can predict 1 or more label as an output and hence training data also have multiple labels.

Somehow I'm not able to map such ML model training. Please provide me the answer of below mentioned questions:

How loss is getting calculated for such type of problem?
How the ML model predicts?

I found one old thread Multi Label old query, but again Im not able to understand the solution.

user1825567 · Answer

In a normal multi-class setup we use softmax over the last layer because we know that the sum of probability of the classes is 1 as only one of these classes is the actual answer. After that, we pass the output to a suitable loss function like BCE (binary cross-entropy).
For multi-label, we know that each class can be the output so the sum of probability of these classes will not necessarily be 1! However, the individual probability of each class still needs to be between 0 and 1 to be a valid probability distribution. Therefore, simply apply a sigmoid layer on top of each class output and pass this to BCE loss which treats each class as a separate mini BCE loss, and finally adds these losses across all the classes to get the final BCE.
For e.g., in Pytorch, you just apply sigmoid and pass it to the BCE such that the dimensions of the output passed to the BCE layer are (batch size, num of total class).
Look at the BCE loss section in the pytorch docs, where the dimensions of the input to BCE loss is (N, *) which implies the dimension which I just pointed out: TORCH.NN | Pytorch Docs.

user105987 · Answer

For multi class classification , after applying softmax over last layer we generally use categorical cross entropy as loss function rather than BCE.

Multi-Label Loss function and model training

2 Answers

Add your own answers!

Ask a Question