TransWikia.com

Interpretation of Loss and validation Loss in Keras

Data Science Asked on April 30, 2021

I am building a model to predict one label by taking one feature as an input. The two variables seems to be strongly correlated. I wanted to build a (sequential) Neural Network model with Keras in python. However I don’t have much experience in this topic.
For 20 epochs, this was the output:

model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mse'])
history= model.fit(X_train, Y_train, epochs=20, batch_size=32, validation_split=0.15, validation_data=None, verbose=1  )

Epoch 1/20
556/563 [============================>.] - ETA: 0s - loss: 34669264.0000 - mse: 34669264.0000WARNING:tensorflow:Callbacks method `on_test_batch_begin` is slow compared to the batch time (batch time: 0.0000s vs `on_test_batch_begin` time: 0.0010s). Check your callbacks.
563/563 [==============================] - 1s 1ms/step - loss: 34285784.0000 - mse: 34285784.0000 - val_loss: 96.6166 - val_mse: 96.6166
Epoch 2/20
563/563 [==============================] - 1s 1ms/step - loss: 99.0922 - mse: 99.0922 - val_loss: 97.5675 - val_mse: 97.5675
Epoch 3/20
563/563 [==============================] - 1s 1ms/step - loss: 99.9443 - mse: 99.9443 - val_loss: 99.2140 - val_mse: 99.2140
Epoch 4/20
563/563 [==============================] - 1s 1ms/step - loss: 102.9865 - mse: 102.9865 - val_loss: 118.3417 - val_mse: 118.3417
Epoch 5/20
563/563 [==============================] - 1s 1ms/step - loss: 106.5720 - mse: 106.5720 - val_loss: 97.8411 - val_mse: 97.8411
Epoch 6/20
563/563 [==============================] - 1s 947us/step - loss: 105.5193 - mse: 105.5193 - val_loss: 102.9201 - val_mse: 102.9201
Epoch 7/20
563/563 [==============================] - 1s 956us/step - loss: 111.6952 - mse: 111.6952 - val_loss: 152.1037 - val_mse: 152.1037
Epoch 8/20
563/563 [==============================] - 1s 956us/step - loss: 108.9572 - mse: 108.9572 - val_loss: 97.3432 - val_mse: 97.3432
Epoch 9/20
563/563 [==============================] - 1s 1ms/step - loss: 116.4152 - mse: 116.4152 - val_loss: 281.0902 - val_mse: 281.0902
Epoch 10/20
563/563 [==============================] - 1s 1ms/step - loss: 152.9690 - mse: 152.9690 - val_loss: 489.1042 - val_mse: 489.1042
Epoch 11/20
563/563 [==============================] - 1s 1ms/step - loss: 190.2841 - mse: 190.2841 - val_loss: 117.8673 - val_mse: 117.8673
Epoch 12/20
563/563 [==============================] - 1s 1ms/step - loss: 337.4025 - mse: 337.4025 - val_loss: 1454.0408 - val_mse: 1454.0408
Epoch 13/20
563/563 [==============================] - 1s 1ms/step - loss: 5692.8813 - mse: 5692.8813 - val_loss: 4738.1577 - val_mse: 4738.1577
Epoch 14/20
563/563 [==============================] - 1s 1ms/step - loss: 8999.7559 - mse: 8999.7559 - val_loss: 1928.1060 - val_mse: 1928.1060
Epoch 15/20
563/563 [==============================] - 1s 1ms/step - loss: 8781.1357 - mse: 8781.1357 - val_loss: 100.8937 - val_mse: 100.8937
Epoch 16/20
563/563 [==============================] - 1s 1ms/step - loss: 9043.8174 - mse: 9043.8174 - val_loss: 734.2968 - val_mse: 734.2968
Epoch 17/20
563/563 [==============================] - 1s 1ms/step - loss: 8870.1094 - mse: 8870.1094 - val_loss: 604.0785 - val_mse: 604.0785
Epoch 18/20
563/563 [==============================] - 1s 1ms/step - loss: 7896.2520 - mse: 7896.2520 - val_loss: 9735.1504 - val_mse: 9735.1504
Epoch 19/20
563/563 [==============================] - 1s 1ms/step - loss: 37979.0586 - mse: 37979.0586 - val_loss: 315.8015 - val_mse: 315.8015
Epoch 20/20
563/563 [==============================] - 1s 1ms/step - loss: 282.4867 - mse: 282.4867 - val_loss: 350.4554 - val_mse: 350.4554


And these are the plots for the loss (in Blue) and validation loss (in Red):

enter image description here
enter image description here

The loss function (mse) is minimized after two epochs which I guess means that the model ‘learned’ at the point. I don’t understand however why the validation loss has huge fluctuations. I thought that it would have a similar distribution to the loss function.

Can anyone please help me interpret these two plots?

UPDATE: AFTER SCALING THE INPUT VARIABLES & CHOOSING A SMALER LEARNING RATE
enter image description here

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP