TransWikia.com

Categorical cross-entropy works wrong with one-hot encoded features

Data Science Asked by Timofey on May 31, 2021

I’n struggling with categorical_crossentropy problem with one-hot encoding data. The problem is in unchanged output of code presenting below:

            inputs = keras.Input(shape=(1190,), sparse=True)
            lay_1 = layers.Dense(1190, activation='relu')
            x = lay_1(inputs)
            x = layers.Dense(10, activation='relu')(x)
            out = layers.Dense(1, activation='sigmoid')(x)
            self.model = keras.Model(inputs, out, name='SimpleD2Dense')
            self.model.compile(
                optimizer=keras.optimizers.Adam(),
                loss=tf.losses.categorical_crossentropy,
                metrics=['accuracy']
            )
Epoch 1/3
1572/1572 - 6s - loss: 5.7709e-08 - accuracy: 0.5095 - val_loss: 7.0844e-08 - val_accuracy: 0.5543
Epoch 2/3
1572/1572 - 6s - loss: 5.7709e-08 - accuracy: 0.5095 - val_loss: 7.0844e-08 - val_accuracy: 0.5543
Epoch 3/3
1572/1572 - 7s - loss: 5.7709e-08 - accuracy: 0.5095 - val_loss: 7.0844e-08 - val_accuracy: 0.5543

Few words about data: 1190 features (10 actual features with 119 categories). The inputs are a dataframe rows with 1190 values per sample. Output is a binary value 0 or 1.

Attempts done before: binary_crossentropy used with satisfying results, however, number of samples is not enough to get good results on validation data. Tried to use different activations and layer sizes.

Main question is why categorical_crossentropy is not working and how to use it in right way.

Also, one concern appears about data representation is it right way to use in one rare row of straightforward one-hot encoded data?

One Answer

For it to work -

  1. Change output neurons count to 02
  2. Activation of output to Softmax
  3. Keep all the vectors of OHE output


This is how Keras is designed internally. Same has been written on the official documentation page

BinaryCrossentropy class
Use this cross-entropy loss when there are only two label classes (assumed to be 0 and 1). For each example, there should be a single floating-point value per prediction. In the snippet below, each of the four examples has only a single floating-pointing value, and both y_pred and y_true have the shape [batch_size]

CategoricalCrossentropy class
The shape of both y_pred and y_true are [batch_size, num_classes]

And we know that to keep the Classification multi-class you need to make all the num_class output relative to each other, so we use softmax

Ref
Keras official page
Similar SE thread
Similar SE thread

Correct answer by 10xAI on May 31, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP