Data Science Asked by Islam Kh on July 8, 2021
I’m training a CNN network to detect relations between entities in written texts.
I am suffering from an overfitting problem, I have high accuracy and low loss at the training step, but my model can’t predicate anything correctly at the test phase.
I know that my model is complex (My network contains: embedding lookup, convolution layer, operations on the output of the convolution layer represent the piecewise max pooling and dense layer ), in addition, I have a few training data (I have 13,396 sentences), with a number of classes equal to 98. That may cause the problem of overfitting but I have three epoch, in each one, I have an about 150 step, in each step there will be 50 sentences picked randomly and passed to the network to be trained on. so totally my model will be trained over ~ 22,500 sentences.
The thing is, is it possible that after just 50 steps (2500 sentences) to get accuracy = 1, and loss equal to Zero !!
Is it possible that my network Immediately face the problem of overfitting, or there may be other reasons why I am getting so?
Also, I know that even if my model is flexible, I may get hight accuracy at the training phase, but still cannot get accuracy =1! there still the irreducible error which will always make my model not reaching this.
So is it even possible to get accuracy =1, even if my network suffer from overfitting? and Could overfitting also happen because my data may have so many noises?
I have to mention that, I am training my model using Google Colab, and I have trained it several times, each time I discover there is something wrong in the code after the network finished training, I just delete the files generated in the training step and do restart runtime. then train it again.
Also, the network applies the (L2 regularization) with lambda equal to 0.0001 as follow:
self.l2_loss = tf.contrib.layers.apply_regularization(
regularizer=tf.contrib.layers.l2_regularizer(0.0001),
weights_list=tf.trainable_variables()
)
and apply a drop out regularization on the piecewise max-pooling output (context_output) with keep_prob = 0.5 as follow:
context_output = tf.nn.dropout(context_output, keep_prob)
Plus, I checked the network (I made another version of it) on another data (written in another language) and it worked fine on it, but when I changed the data, which goes through the same pre-processing method, I interfaced this problem, the original data has 556596 sentences in the training phase, but I couldn’t guess if the problem happened because my data has few sentences only to be trained on, I mean perhaps I would be satisfied with the idea of overfitting if the training results were too high, and the test results were very low, but to have accuracy in the training phase equal to 1! And in the testing stage, not even one correct sentence is predicted! This makes me wonder if the lack of data is really the real problem
Bellow results I get (it continue like that giving perfect results until reaching step 450 which is my last step )
step50, softmax_loss 0, acc 1
step100, softmax_loss 0, acc 1
step150, softmax_loss 0, acc 1
step200, softmax_loss 0, acc 1
step250, softmax_loss 0, acc 1
Bellow example of results I get form testing, where the number of sentences = 4473 (100 0.0, is How many sentences the model guessed correctly from total 100 sentences, and so on. Here the number of sentences given to the network is 510):
for all test data:
100
0.0
200
0.0
300
0.0
Could someone explain to me what is happening in my case?
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP