How to calculate the output shape of conv2d_transpose?

Question

Currently I code a GAN to generate MNIST numbers but the generator doesnt want to work. First I choose z with shape 100 per Batch, put into a layer to get into the shape (7,7, 256). Then conv2d_transpose layer
to into 28, 28, 1. (which is basically a mnist pic)

I have two questions 
1.) This code doesn't work for obvious. Do you have any clue, why?
2.) I am very aware how transpose convolution works but I can't find any resource to calculate the output size given input, strides and kernel size specific to Tensorflow. The useful information I found is https://arxiv.org/pdf/1603.07285v1.pdf but e.g. padding in Tensorflow works very different. Can you help me?

mb_size = 32 #Size of image batch to apply at each iteration.
X_dim = 784
z_dim = 100
h_dim = 7*7*256
dropoutRate = 0.7
alplr = 0.2 #leaky Relu

def generator(z, G_W1, G_b1, keepProb, first_shape):

G_W1 = tf.Variable(xavier_init([z_dim, h_dim]))
    G_b1 = tf.Variable(tf.zeros(shape=[h_dim]))

G_h1 = lrelu(tf.matmul(z, G_W1) + G_b1, alplr)
    G_h1Drop = tf.nn.dropout(G_h1, keepProb)  # drop out

X = tf.reshape(G_h1Drop, shape=first_shape)
    out = create_new_trans_conv_layer(X, 256, INPUT_CHANNEL, [3, 3], [2,2], "transconv1", [-1, 28, 28, 1])    
    return out

# new transposed cnn
def create_new_trans_conv_layer(input_data, num_input_channels, num_output_channels, filter_shape, stripe, name, output_shape):
    # setup the filter input shape for tf.nn.conv_2d
    conv_filt_shape = [filter_shape[0], filter_shape[1], num_output_channels, num_input_channels]

# initialise weights and bias for the filter
    weights = tf.Variable(tf.truncated_normal(conv_filt_shape, stddev=0.03),
                          name=name + '_W')
    bias = tf.Variable(tf.truncated_normal([num_input_channels]), name=name + '_b')

# setup the convolutional layer operation
    conv1 = tf.nn.conv2d_transpose(input_data, weights, output_shape, [1, stripe[0], stripe[1], 1], padding='SAME')

# add the bias
    conv1 += bias

# apply a ReLU non-linear activation

conv1 = lrelu(conv1, alplr)

return conv1

...

_, G_loss_curr = sess.run(
        [G_solver, G_loss],
        feed_dict={z: sample_z(mb_size, z_dim), keepProb: 1.0} #training generator

Manish P · Accepted Answer

Here is the correct formula for computing the size of the output with tf.layers.conv2d_transpose():

# Padding==Same:
H = H1 * stride

# Padding==Valid
H = (H1-1) * stride + HF

where, H = output size, H1 = input size, HF = height of filter

e.g., if `H1` = 7, Stride = 3, and Kernel size = 4,

With padding=="same", output size = 21, 
with padding=="valid", output size = 22

To test this out (verified in tf 1.4.0):

import tensorflow as tf
import numpy as np

x = tf.placeholder(dtype=tf.float32, shape=(None, 7, 7, 32))
dcout = tf.layers.conv2d_transpose(x, 64, 4, 3, padding="valid")

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    xin = np.random.rand(1,7,7,32)
    out = sess.run(dcout, feed_dict={x:xin})
    print(out.shape)

Yash Khare · Answer

Instead of using tf.nn.conv2d_transpose you can use tf.layers.conv2d_transpose
It is a wrapper layer and there is no need to input output shape or if you want to calculate output shape you can use the formula:

H = (H1 - 1)*stride + HF - 2*padding
H - height of output image i.e H = 28 
H1 - height of input image i.e H1 = 7 
HF - height of filter

ludog · Answer

The answers here give figures that work, but they don't mention that there are multiple possible output shapes for the convolution-transpose operation. Indeed, if the output shape was completely determined by the other parameters then there would be no need for it to be specified.

The output size of a convolution operation is

# padding=="SAME" 
conv_out = ceil(conv_in/stride)

# padding=="VALID" 
conv_out = ceil((conv_in-k+1)/stride)

where conv_in is the input size and k is the kernel size. In OP's link these padding methods are called 'half padding' and 'no padding' respectively.

When calling

tf.nn.conv2d_transpose(value, filter, output_shape, strides)

we need the output_shape parameter to be the shape of a tensor that, if convolved with filter and strides, would have produced a tensor of the same shape as value. Because of rounding, there are multiple such shapes when stride>1. Specifically, we need

dconv_in-1 <= (dconv_out-k)/s <= dconv_in 
==> 
(dconv_in-1)s + k <= dconv_out <= (dconv_in)s + k

If dconv_in = 7, k = 4, stride = 3

# with SAME padding
dconv_out = 19 or 20 or 21

# with VALID padding
dconv_out = 22 or 23 or 24

The tf.layers API automatically calculates an output_shape (which seems to be the smallest possible for VALID padding and the largest possible for SAME padding). This is often convenient, but can also lead to shape mismatches if you are trying to recover the shape of a previously convolved tensor, eg in an autoencoder. For example

import tensorflow as tf
import numpy as np


k=22
cin = tf.placeholder(tf.float32, shape=(None, k+1,k+1,64))
w1 = tf.placeholder(tf.float32, shape=[4,4,64,32])
cout = tf.nn.conv2d(cin, w1, strides=(1,3,3,1), padding="VALID")               
f_dict={cin:np.random.rand(1,k+1,k+1,64),
        w1:np.random.rand(4,4,64,32)}

dcout1 = tf.nn.conv2d_transpose(cout, w1, strides=(1,3,3,1), 
        padding="VALID", output_shape=[1,k,k,64])
dcout2 = tf.nn.conv2d_transpose(cout, w1, strides=(1,3,3,1), 
        padding="VALID", output_shape=[1,k+1,k+1,64])
dcout_layers = tf.layers.conv2d_transpose(cout, 64, 4, 3, padding="VALID")


with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    inp_shape = sess.run(cin, feed_dict=f_dict).shape
    conv_shape = sess.run(cout, feed_dict=f_dict).shape
    lyrs_shape = sess.run(rcout, feed_dict=f_dict).shape
    nn_shape1 = sess.run(dcout1, feed_dict=f_dict).shape
    nn_shape2 = sess.run(dcout2, feed_dict=f_dict).shape


    print("original input shape:", inp_shape)
    print("shape after convolution:", conv_shape)
    print("recovered output shape using tf.layers:", lyrs_shape)
    print("one possible recovered output shape using tf.nn:", nn_shape1)
    print("another possible recovered output shape using tf.nn:", nn_shape2)

>>> original input shape: (1, 23, 23, 64)
>>> shape after convolution: (1, 8, 8, 32)
>>> recovered output shape using tf.layers: (1, 22, 22, 64)
>>> one possible recovered output shape using tf.nn: (1, 22, 22, 64)
>>> another possible recovered output shape using tf.nn: (1, 23, 23, 64)

Kyle · Answer

Take a look at the source code for tf.keras.Conv2DTranspose, which calls the function deconv_output_length when calculating its output size. There's a subtle difference between the accepted answer and what you find here:

def deconv_output_length(input_length, filter_size, padding,
                         output_padding=None, stride=0, dilation=1):
  """Determines output length of a transposed convolution given input length.
  Arguments:
      input_length: Integer.
      filter_size: Integer.
      padding: one of `"same"`, `"valid"`, `"full"`.
      output_padding: Integer, amount of padding along the output dimension.
          Can be set to `None` in which case the output length is inferred.
      stride: Integer.
      dilation: Integer.
  Returns:
      The output length (integer).
  """
  assert padding in {'same', 'valid', 'full'}
  if input_length is None:
    return None

# Get the dilated kernel size
  filter_size = filter_size + (filter_size - 1) * (dilation - 1)

# Infer length if output padding is None, else compute the exact length
  if output_padding is None:
    if padding == 'valid':
      # note the call to `max` below!
      length = input_length * stride + max(filter_size - stride, 0)
    elif padding == 'full':
      length = input_length * stride - (stride + filter_size - 2)
    elif padding == 'same':
      length = input_length * stride

else:
    if padding == 'same':
      pad = filter_size // 2
    elif padding == 'valid':
      pad = 0
    elif padding == 'full':
      pad = filter_size - 1

length = ((input_length - 1) * stride + filter_size - 2 * pad +
              output_padding)
  return length

I added the comment above the call to max.

The formula for padding == 'valid' is H = H1 * stride + max(HF - stride, 0), which only varies from @Manish P's answer when stride < HF. This one got me into trouble, so I thought I'd post it here.

How to calculate the output shape of conv2d_transpose?

4 Answers

Add your own answers!

Ask a Question