CNN implementation low accuracy on MINST data

Question

I'm trying to implement VGG11 (Model A of Table 1 from this article) on the MINST dataset but I'm getting ~10% train & test accuracy (as bad as random guessing). I had to resize the MINST data from 28x28 to 32x32 to fit the CNN architecture. This is what I did:

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPooling2D, Flatten
from keras import optimizers, utils
from PIL import Image, ImageFilter
import numpy as np
import tensorflow as tf

# Preprocessing

x_size = 6000 # Changed to reduce training time 
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train_ = np.ndarray((x_size, 32, 32))
x_test_ = np.ndarray((x_test.shape[0], 32, 32))

# Resizing inputs to 32x32
for i in [0, x_size-1]:
    im = Image.fromarray(x_train[i], mode=None)
    im = im.resize((32, 32))
    x_train_[i] = np.array(im)
for i in [0,x_test.shape[0]-1]:
    im = Image.fromarray(x_test[i], mode=None)
    im = im.resize((32, 32))
    x_test_[i] = np.array(im)

x_train_ = x_train_.reshape(x_train_.shape[0], 32, 32, 1)
x_test_ = x_test_.reshape(x_test_.shape[0], 32, 32, 1)

y_train = utils.to_categorical(y_train,10)
y_test = utils.to_categorical(y_test,10)
y_train_ = y_train[:x_size]

# Model A (VGG11) of Table 1: ConvNet configurations from paper arXiv:1409.1556v6

model = Sequential()
model.add(Conv2D(64, kernel_size=(3, 3), strides=(1, 1), activation='relu', padding='same', input_shape=(32, 32, 1), data_format='channels_last'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(Conv2D(128, kernel_size=(3, 3), strides=(1, 1), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(Conv2D(256, kernel_size=(3, 3), strides=(1, 1), activation='relu', padding='same'))
model.add(Conv2D(256, kernel_size=(3, 3), strides=(1, 1), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(Conv2D(512, kernel_size=(3, 3), strides=(1, 1), activation='relu', padding='same'))
model.add(Conv2D(512, kernel_size=(3, 3), strides=(1, 1), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(Conv2D(512, kernel_size=(3, 3), strides=(1, 1), activation='relu', padding='same'))
model.add(Conv2D(512, kernel_size=(3, 3), strides=(1, 1), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2,2)))
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dense(4096, activation='relu'))
model.add(Dense(1000, activation='relu'))
model.add(Dense(10, activation='softmax'))

# Model compilation

model.compile(loss='categorical_crossentropy', optimizer=optimizers.SGD(lr=0.001, momentum=0.9, nesterov=True, clipnorm=1.), metrics=['accuracy'])

# Model fitting

model.fit(x_train_, y_train_, epochs=1, batch_size=32)

# Model evaluation

score = model.evaluate(x_train_, y_train_)
print('Train loss after 1 epoch:', score[0])
print('Train accuracy after 1 epoch:', score[1])

I've tried normalizing the input, changing training sizes, increasing epochs, changing FC/filter size, and changing optimizers (and learning rate). Train accuracy is as low from both the evaluation report and TensorFlow's History report. I'm expecting >95% accuracy. What am I doing wrong?

Rajith Thennakoon · Answer

Try by adding dropout to the network to avoid overfitting.
read the docs for more information
https://keras.io/layers/core/

and try these things as well

since the targets are integers,its better to use sparse_categorical_crossentropy than categorical_crossentropy and optimizer as Adam

model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizers.Adam(lr=0.001), metrics=['accuracy'])

and try by using sigmoid activation function for output layer

model.add(Dense(10, activation='sigmoid'))

serali · Answer

What did you increased the epochs to? You are trying to retrain VGG11 from scratch and it has over 30million parameters, which is expected to take a long time. Are you trying to use transfer learning, taking the pre-trained weights and freezing all the layers but the last one to use for your classification problem? In that case you are right to expect over %95 accuracy after a few epochs.

I don't know where to find the pre-trained VGG11 for TensorFlow by here is the one for PyTorch.

CNN implementation low accuracy on MINST data

2 Answers

Add your own answers!

Ask a Question