How to Implement a Custom Loss Function with Keras for a Sparse Dataset

Question

My dataset is composed of an idle system that, at some time instants, receives requests. I'm trying to predict these instants through a clock. Since the requests are sparsely distributed (I've forced them to last for a while so they don't get too sparse), I wanted to create a new loss function that would penalize the model if it only gives out a zero prediction for everything. My implementation attempt is just a penalty for the standard logits:

def sparse_penalty_logits(y_true, y_pred):
    penalty = 10
    if y_true != 0:
        loss = -penalty*K.sum((y_true*K.log(y_pred) + (1 - y_true)*K.log(1 - y_pred)))
    else:
        loss = -K.sum((y_true*K.log(y_pred) + (1 - y_true)*K.log(1 - y_pred)))

return loss

Is it correct? (I have also tried it with tensorflow). Every time I run it I either get a lot of NaN's as the loss or predictions that are not binary at all. I wonder if I'm doing something wrong at setting up the model also because binary_crossentropy is not working properly either. My model is something like this (the targets are represented by a column with either 0's or 1's):

model = Sequential()
model.add(Dense(100, activation = 'relu', input_shape = (train.shape[1],)))
model.add(Dense(100, activation = 'relu'))
model.add(Dense(100, activation = 'relu'))
model.add(Dense(1, activation = 'sigmoid'))

model.compile(optimizer = 'adam', loss = sparse_penalty_logits)

If I run it, as I said, I get very strange results (boy, do I feel like I've messed up real bad...):

thanatoz · Answer

From the mentioned problems you are facing, this seems like a problem of exploding gradients. The exploding gradients problem can be identified by:

The model is unable to get traction on your training data (e.g. poor loss).
The model is unstable, resulting in large changes in loss from update to update.
The model loss goes to NaN during training.

More about Exploding gradient problem can be found at this article

I would suggest you to use some gradient clipping technique in you code and this will remove the NaN generation during model training.

How to Implement a Custom Loss Function with Keras for a Sparse Dataset

One Answer

Add your own answers!

Ask a Question