TransWikia.com

My weight matrix converged to zeros

Data Science Asked on April 20, 2021

So I was training a fairly shallow convnet, because my deepnet based on vgg19 wasn’t working. 2 conv layers and 2 flat layers, the second flat layer was the output.

It converged quickly to all zeros in the second conv layer. First conv layer wasn’t all zeros. First flat layer appears to have learned the ditribution of classes.

So it appears that my network’s strategy os to ignore the inputs and just predict the class distribution. I tried class weights, same result just a more uniform distribution gets predicted.

I changed the learning rate, optimizer, and even tried gradient clipping. I augmented the data, and introduced regularization on the data, and network via dropout layers. No luck. Same result.

Why would my network exhibit this behavior and what can be done about it?

EDIT:

Here is a code snippet that defines the network that I am using:

def discriminator_model():


model = Sequential()
model.add(Conv2D(32, 
                 (10, 10),
                 strides=(2,2),
                 input_shape=( 256, 256, 3),
                 kernel_initializer='random_uniform',
                 bias_initializer='zeros'))
#model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (5, 5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(228, activation='sigmoid'))
return model

After training I was trying to see what the layers had actually learned with the following code:

from keras import backend as K

inp = d.input                                           # input placeholder
outputs = d.layers[3].output
functors = K.function([inp], [outputs])
print(functors([x])[0].shape)
arr = functors([x])[0].reshape(64,29,29,1)
for i in range(32):
     #p = (255./(np.max(arr[i])-np.min(arr[i])))*(arr[i]-np.min(arr[i]))
     p=255.0*arr[i]
     plt.imshow(p.reshape(29,29))
     plt.show()

Here is an example of what I am getting as output from the
output from 2nd conv layer

And here is example output from the first layer for comparison, which I generated with similar code:
enter image description here

Also tried changing loss functions, removing pooling layers, etc. I know it is weird. I’ve never come across anything quite like this before.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP