TransWikia.com

How to Implement a Custom Loss Function with Keras for a Sparse Dataset

Data Science Asked by Philippe Fanaro on June 24, 2021

My dataset is composed of an idle system that, at some time instants, receives requests. I’m trying to predict these instants through a clock. Since the requests are sparsely distributed (I’ve forced them to last for a while so they don’t get too sparse), I wanted to create a new loss function that would penalize the model if it only gives out a zero prediction for everything. My implementation attempt is just a penalty for the standard logits:

def sparse_penalty_logits(y_true, y_pred):
    penalty = 10
    if y_true != 0:
        loss = -penalty*K.sum((y_true*K.log(y_pred) + (1 - y_true)*K.log(1 - y_pred)))
    else:
        loss = -K.sum((y_true*K.log(y_pred) + (1 - y_true)*K.log(1 - y_pred)))

    return loss

Is it correct? (I have also tried it with tensorflow). Every time I run it I either get a lot of NaN‘s as the loss or predictions that are not binary at all. I wonder if I’m doing something wrong at setting up the model also because binary_crossentropy is not working properly either. My model is something like this (the targets are represented by a column with either 0‘s or 1‘s):

model = Sequential()
model.add(Dense(100, activation = 'relu', input_shape = (train.shape[1],)))
model.add(Dense(100, activation = 'relu'))
model.add(Dense(100, activation = 'relu'))
model.add(Dense(1, activation = 'sigmoid'))

model.compile(optimizer = 'adam', loss = sparse_penalty_logits)

If I run it, as I said, I get very strange results (boy, do I feel like I’ve messed up real bad…):

Not binary at all. Boy, have I messed this up...

One Answer

From the mentioned problems you are facing, this seems like a problem of exploding gradients. The exploding gradients problem can be identified by:

  • The model is unable to get traction on your training data (e.g. poor loss).
  • The model is unstable, resulting in large changes in loss from update to update.
  • The model loss goes to NaN during training.

More about Exploding gradient problem can be found at this article

I would suggest you to use some gradient clipping technique in you code and this will remove the NaN generation during model training.

Answered by thanatoz on June 24, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP