Data Science Asked by Ricky Sanjaya on October 2, 2020
In multi-class classification (mutually exclusive classes) using Neural Networks (ANN), it is generally advised that we encode our target labels as one-hot, use a softmax layer as the output with the number of nodes at the output layer being equal to the number of classes in the problem, and using categorical cross entropy as loss function. However, I was wondering, is it possible to use sigmoid activation functions with binary cross entropy as loss function, AND less number of nodes at the output layer instead? Here’s an example:
Suppose the problem has 4 classes. We use only 2 nodes at the output layer, with sigmoid activations. The way we encode the target labels are then: (0,0) for class 1, (0,1) for class 2, (1,0) for class 3, and (1,1) for class 4 (like using binary codes). So if the input is of class 3, then we expect our network to output (1,0).
I have read posts similar to this, How does Sigmoid activation work in multi-class classification problems and https://glassboxmedicine.com/2019/05/26/classification-sigmoid-vs-softmax/ but in these cases, they are saying why we shouldn’t use sigmoid with number of nodes in the output layer being equal to the number of classes, whereas what I’m proposing here is to use less. Note that the above encoding also works if we have say 3 classes (we thus only use 3 labels, e.g. (0,0), (1,0) and (1,1)).
Would the above implementation work as good or even better than softmax? And any reasons why it should or shouldn’t work better?
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP