Loss of NAN, Accuracy of 0 - Any idea why? Full code provided

Question

I've been on this for the past few days and couldn't figure it out. Posted on various groups, StackOverflow etc and got suggestions from many users. I implemented these suggestions into the code shown below, but still having the same issue. Sorry for the lengthy post, but I want to be as clear as possible. All relevant code snippets are shown below:

Setting up image paths:

imagepaths = []

for root, dirs, files in os.walk(".", topdown=False): 
  for name in files:
    path = os.path.join(root, name)
    if path.endswith("jpg"): # We want only the images
      imagepaths.append(path)

Loading into arrays, preprocessing:

X = [] # Image data
y = [] # Labels

datagen = ImageDataGenerator(rescale=1./255, samplewise_center=True)

# Loops through imagepaths to load images and labels into arrays
for path in imagepaths:
  img = cv2.imread(path) # Reads image and returns np.array
  img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Converts into the corret colorspace (GRAY) #find rgb 
  img = cv2.resize(img, (75, 75)) # Reduce image size so training can be faster
  img = image.img_to_array(img)
  img = datagen.standardize(img)
  X.append(img)

# Processing label in image path
  category = path.split("")[1]
  #print(category)
  split = (category.split("_"))     
  if int(split[0]) == 0:
    label = int(split[1])
  else:
    label = int(split[0])
  y.append(label)

# Turn X and y into np.array to speed up train_test_split

X = np.array(X, dtype="float32") #ORIGINAL uint8
X = X.reshape(len(imagepaths), 75, 75, 1) 
y = np.array(y)
tf.keras.utils.to_categorical(X, num_classes=None, dtype="float32")
tf.keras.utils.to_categorical(y, num_classes=None, dtype="float32")

Creating test set:

ts = 0.3 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=ts, random_state=42)

Creating model. Yes I know its super small, just 1 layer, but I was suggested to cut down to start from the base and build up. Originally it was 5 layers, but the results are still the same.

model = Sequential()

model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(75, 75, 1))) 
model.add(MaxPooling2D((2, 2)))
model.add(BatchNormalization())
model.add(Dropout(0.5))

model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(26, activation='softmax'))

Compiling the model. And fitting. I was told that the gradient could be exploding, so was suggested to add the first line with the clipnorm..

adam = keras.optimizers.Adam(clipnorm=1.)

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=5, batch_size=32, verbose=1, validation_data=(X_test, y_test))

The final training can be seen here. The issues are, losses are NAN and accuracies are 0.

Train on 54600 samples, validate on 23400 samples
Epoch 1/5
54600/54600 [==============================] - 14s 265us/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 2/5
54600/54600 [==============================] - 15s 269us/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 3/5
54600/54600 [==============================] - 15s 273us/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 4/5
54600/54600 [==============================] - 15s 267us/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 5/5
54600/54600 [==============================] - 14s 263us/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00

Here are a list of things which I did wrong and was suggested to do by others:

I didn't standardize the data originally - So, I did ImageDataGenerator, rescaled, and standardized it.

I was suggested to turn the data to categorical, which I did use the to_categorical function (I think i did that right) but I'm not sure if there's anything else required.

Reduce model complexity. I did that brought it to only one layer to debug.

Possible exploding gradient - so changed the adam optimizer with clipnorm = 1

BACKGROUND: This model trains and recognizes the 26 letters of the alphabet. I know the dataset is fine because when I use it to train a model for 10 letters at a time (A-J) for example it works fine. The issue is only when I go from 10-26. Yes, I did try to change the dense to 26 on the original code but that did not work.

I've been staring at this and trying everything for the past two days...

ANY HELP IS APPRECIATED

Vincent Yong · Answer

I think there might be a number of errors here:

There is no need to convert X to categorical, only your labels. 
Hence, there is not need for this line: tf.keras.utils.to_categorical(X, num_classes=None, dtype="float32")
to_categorical is not inplace so you would need to reassign it to y. Change it to: y = tf.keras.utils.to_categorical(y)
Once you convert them to categorical, your loss function should be 'categorical_crossentropy'

Pranshu Mishra · Answer

First of all I would suggest you to use datagen.flow_from_directory to load the dataset. Also your model has become too simple now, try adding atleast 1or2 more Conv layers.

For your problem, it might be a case of exploding gradient. Try reducing learning rate and adding batch normalization between layers. Maybe try SGD with momentum or RMSprop optimizer, that helps sometimes.

Loss of NAN, Accuracy of 0 - Any idea why? Full code provided

2 Answers

Add your own answers!

Ask a Question