Convolutional autoencoders not learning

Question

I'm trying to implement convolutional autoencoders in tensorflow, on the mnist dataset.

The problem is that the autoencoder does not seem to learn properly: it will always learn to reproduce the 0 shape, but no other shapes, in fact I usually get an average loss of about 0.09, which is 1/10 of the classes that it should learn.

I am using 2x2 kernels with stride 2 for the input and output convolutions, but the filters seems to be learned properly. When I visualize the data, input image is passed thru 16 (1st conv) and 32 filters (2nd conv), and by image inspection it seems running fine (i.e. apparently features like curves, crosses, etc are detected).

The problem seems to arise in the fully connected part of the network: no matter what is the input image, its encoding will be always the same.

My first thought is "I'm probably just feeding it with zeroes while training", but I don't think I made this mistake (see code below).

Edit I realized the dataset was not shuffled, which introduced a bias and could be the cause of the problem. After introducing it, the average loss is lower (0.06 instead of 0.09), and in fact the output image looks like a blurry 8, but conclusions are the same: the encoded input will be the same no matter what is the input image.

Here a sample input with the relative output

Here are the activation for the image above, with the two fully connected layers at the bottom (encoding is the bottommost).

Finally, here there are the activation for the fully connected layers for different inputs. Each input image corresponds to a line in the activation images.

As you can see, they always yield the same output. If I use transposed weights instead of initializing different ones, the first FC layer (image in the middle) looks a bit more randomized, but the underlying pattern is still evident. In the encoding layer (image at the bottom), the output will be always the same no matter what is the input (of course, the pattern varies from one training and the next).

Here's the relevant code

# A placeholder for the input data
x = tf.placeholder('float', shape=(None, mnist.data.shape[1]))

# conv2d_transpose cannot use -1 in output size so we read the value
# directly in the graph
batch_size = tf.shape(x)[0]

# Variables for weights and biases
with tf.variable_scope('encoding'):
    # After converting the input to a square image, we apply the first convolution, using 2x2 kernels
    with tf.variable_scope('conv1'):
        wec1 = tf.get_variable('w', shape=(2, 2, 1, m_c1), initializer=tf.truncated_normal_initializer())
        bec1 = tf.get_variable('b', shape=(m_c1,), initializer=tf.constant_initializer(0))
    # Second convolution
    with tf.variable_scope('conv2'):
        wec2 = tf.get_variable('w', shape=(2, 2, m_c1, m_c2), initializer=tf.truncated_normal_initializer())
        bec2 = tf.get_variable('b', shape=(m_c2,), initializer=tf.constant_initializer(0))
    # First fully connected layer
    with tf.variable_scope('fc1'):
        wef1 = tf.get_variable('w', shape=(7*7*m_c2, n_h1), initializer=tf.contrib.layers.xavier_initializer())
        bef1 = tf.get_variable('b', shape=(n_h1,), initializer=tf.constant_initializer(0))
    # Second fully connected layer
    with tf.variable_scope('fc2'):
        wef2 = tf.get_variable('w', shape=(n_h1, n_h2), initializer=tf.contrib.layers.xavier_initializer())
        bef2 = tf.get_variable('b', shape=(n_h2,), initializer=tf.constant_initializer(0))

reshaped_x = tf.reshape(x, (-1, 28, 28, 1))
y1 = tf.nn.conv2d(reshaped_x, wec1, strides=(1, 2, 2, 1), padding='VALID')
y2 = tf.nn.sigmoid(y1 + bec1)
y3 = tf.nn.conv2d(y2, wec2, strides=(1, 2, 2, 1), padding='VALID')
y4 = tf.nn.sigmoid(y3 + bec2)
y5 = tf.reshape(y4, (-1, 7*7*m_c2))
y6 = tf.nn.sigmoid(tf.matmul(y5, wef1) + bef1)
encode = tf.nn.sigmoid(tf.matmul(y6, wef2) + bef2)

with tf.variable_scope('decoding'):
    # for the transposed convolutions, we use the same weights defined above
    with tf.variable_scope('fc1'):
        #wdf1 = tf.transpose(wef2)
        wdf1 = tf.get_variable('w', shape=(n_h2, n_h1), initializer=tf.contrib.layers.xavier_initializer())
        bdf1 = tf.get_variable('b', shape=(n_h1,), initializer=tf.constant_initializer(0))
    with tf.variable_scope('fc2'):
        #wdf2 = tf.transpose(wef1)
        wdf2 = tf.get_variable('w', shape=(n_h1, 7*7*m_c2), initializer=tf.contrib.layers.xavier_initializer())
        bdf2 = tf.get_variable('b', shape=(7*7*m_c2,), initializer=tf.constant_initializer(0))
    with tf.variable_scope('deconv1'):
        wdd1 = tf.get_variable('w', shape=(2, 2, m_c1, m_c2), initializer=tf.contrib.layers.xavier_initializer())
        bdd1 = tf.get_variable('b', shape=(m_c1,), initializer=tf.constant_initializer(0))
    with tf.variable_scope('deconv2'):
        wdd2 = tf.get_variable('w', shape=(2, 2, 1, m_c1), initializer=tf.contrib.layers.xavier_initializer())
        bdd2 = tf.get_variable('b', shape=(1,), initializer=tf.constant_initializer(0))

u1 = tf.nn.sigmoid(tf.matmul(encode, wdf1) + bdf1)
u2 = tf.nn.sigmoid(tf.matmul(u1, wdf2) + bdf2)
u3 = tf.reshape(u2, (-1, 7, 7, m_c2))
u4 = tf.nn.conv2d_transpose(u3, wdd1, output_shape=(batch_size, 14, 14, m_c1), strides=(1, 2, 2, 1), padding='VALID')
u5 = tf.nn.sigmoid(u4 + bdd1)
u6 = tf.nn.conv2d_transpose(u5, wdd2, output_shape=(batch_size, 28, 28, 1), strides=(1, 2, 2, 1), padding='VALID')
u7 = tf.nn.sigmoid(u6 + bdd2)
decode = tf.reshape(u7, (-1, 784))

loss = tf.reduce_mean(tf.square(x - decode))
opt = tf.train.AdamOptimizer(0.0001).minimize(loss)

try:
    tf.global_variables_initializer().run()
except AttributeError:
    tf.initialize_all_variables().run() # Deprecated after r0.11

print('Starting training...')
bs = 1000 # Batch size
for i in range(501): # Reasonable results around this epoch 
    # Apply permutation of data at each epoch, should improve convergence time
    train_data = np.random.permutation(mnist.data)
    if i % 100 == 0:
        print('Iteration:', i, 'Loss:', loss.eval(feed_dict={x: train_data}))
    for j in range(0, train_data.shape[0], bs):
        batch = train_data[j*bs:(j+1)*bs]
        sess.run(opt, feed_dict={x: batch})
        # TODO introduce noise
print('Training done')

AkiRoss · Accepted Answer

Well, the problem was mainly related to the kernel size. Using 2x2 convolution with stride of (2,2) turned to be a bad idea. Using 5x5 and 3x3 sizes yielded decent results.

Convolutional autoencoders not learning

One Answer

Add your own answers!

Ask a Question