Training a CNN on a large dataset

Question

I am currently trying to build a CNN for around 100,000 images. There are 42 classes. I have used the default batch size of 32. This is how my model looks like:
model = Sequential()
model.add(Conv2D(filters = 32, kernel_size = (3, 3), activation = 'relu', input_shape = training_data.image_shape))
model.add(MaxPool2D(pool_size = (2, 2)))
model.add(Dropout(rate = 0.3))

model.add(Conv2D(filters = 64, kernel_size = (3, 3), activation = 'relu'))
model.add(MaxPool2D(pool_size = (2, 2)))
model.add(Dropout(rate = 0.2))

model.add(Conv2D(filters = 126, kernel_size = (3, 3), activation = 'relu'))
model.add(MaxPool2D(pool_size = (2, 2)))
model.add(Dropout(rate = 0.15))

model.add(Flatten())

model.add(Dense(units = 32, activation = 'relu'))
model.add(Dropout(rate = 0.15))

model.add(Dense(units = 64, activation = 'relu'))
model.add(Dropout(rate = 0.1))

model.add(Dense(units = 42, activation = 'softmax'))
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

However, the training time takes super long and each epoch takes around 35 minutes to run. The accuracy is also very low and increases very slowly.
My jupyter lab will sometimes stop and have to refresh everything again. So is there a way to train in smaller batches? Or a way to improve the training speed? Any help is appreciated. It is a very huge dataset.
Epoch 1/15
2307/2307 [==============================] - 3999s 2s/step - loss: 3.5377 - accuracy: 0.0687 - val_loss: 3.3247 - val_accuracy: 0.1223
Epoch 2/15
2307/2307 [==============================] - 3764s 2s/step - loss: 3.2884 - accuracy: 0.1239 - val_loss: 3.1065 - val_accuracy: 0.1739
Epoch 3/15
2307/2307 [==============================] - 2204s 955ms/step - loss: 3.1435 - accuracy: 0.1562 - val_loss: 2.9825 - val_accuracy: 0.2069
Epoch 4/15
2307/2307 [==============================] - 2193s 951ms/step - loss: 3.0526 - accuracy: 0.1778 - val_loss: 2.9059 - val_accuracy: 0.2171

user1288043 · Answer

When you use keras fit, pass the value for x as a generator function which will provide (perhaps using yield) the batch of data (x, y) tuple. Also in the generator function, you can use checkpoint.
https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

10xAI · Answer

Changing the batch size will not change the overall training time too much. Since with any batch size you are passing almost 80K images.
One(and the best) approach will be to use transfer learning.
If you have a compelling reason to do full training, you will need a GPU powered bigger hardware. Google Colab can be an option. There are many other options available
Before that you may try to gauge the model with a sample of ~5k images

Training a CNN on a large dataset

2 Answers

Add your own answers!

Ask a Question