Data Science Asked by snowparrot on July 20, 2021
Currently I code a GAN to generate MNIST numbers but the generator doesnt want to work. First I choose z with shape 100 per Batch, put into a layer to get into the shape (7,7, 256). Then conv2d_transpose layer
to into 28, 28, 1. (which is basically a mnist pic)
I have two questions
1.) This code doesn’t work for obvious. Do you have any clue, why?
2.) I am very aware how transpose convolution works but I can’t find any resource to calculate the output size given input, strides and kernel size specific to Tensorflow. The useful information I found is https://arxiv.org/pdf/1603.07285v1.pdf but e.g. padding in Tensorflow works very different. Can you help me?
mb_size = 32 #Size of image batch to apply at each iteration.
X_dim = 784
z_dim = 100
h_dim = 7*7*256
dropoutRate = 0.7
alplr = 0.2 #leaky Relu
def generator(z, G_W1, G_b1, keepProb, first_shape):
G_W1 = tf.Variable(xavier_init([z_dim, h_dim]))
G_b1 = tf.Variable(tf.zeros(shape=[h_dim]))
G_h1 = lrelu(tf.matmul(z, G_W1) + G_b1, alplr)
G_h1Drop = tf.nn.dropout(G_h1, keepProb) # drop out
X = tf.reshape(G_h1Drop, shape=first_shape)
out = create_new_trans_conv_layer(X, 256, INPUT_CHANNEL, [3, 3], [2,2], "transconv1", [-1, 28, 28, 1])
return out
# new transposed cnn
def create_new_trans_conv_layer(input_data, num_input_channels, num_output_channels, filter_shape, stripe, name, output_shape):
# setup the filter input shape for tf.nn.conv_2d
conv_filt_shape = [filter_shape[0], filter_shape[1], num_output_channels, num_input_channels]
# initialise weights and bias for the filter
weights = tf.Variable(tf.truncated_normal(conv_filt_shape, stddev=0.03),
name=name + '_W')
bias = tf.Variable(tf.truncated_normal([num_input_channels]), name=name + '_b')
# setup the convolutional layer operation
conv1 = tf.nn.conv2d_transpose(input_data, weights, output_shape, [1, stripe[0], stripe[1], 1], padding='SAME')
# add the bias
conv1 += bias
# apply a ReLU non-linear activation
conv1 = lrelu(conv1, alplr)
return conv1
...
_, G_loss_curr = sess.run(
[G_solver, G_loss],
feed_dict={z: sample_z(mb_size, z_dim), keepProb: 1.0} #training generator
Here is the correct formula for computing the size of the output with tf.layers.conv2d_transpose()
:
# Padding==Same:
H = H1 * stride
# Padding==Valid
H = (H1-1) * stride + HF
where, H
= output size, H1
= input size, HF
= height of filter
e.g., if `H1` = 7, Stride = 3, and Kernel size = 4,
With padding=="same", output size = 21,
with padding=="valid", output size = 22
To test this out (verified in tf 1.4.0):
import tensorflow as tf
import numpy as np
x = tf.placeholder(dtype=tf.float32, shape=(None, 7, 7, 32))
dcout = tf.layers.conv2d_transpose(x, 64, 4, 3, padding="valid")
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
xin = np.random.rand(1,7,7,32)
out = sess.run(dcout, feed_dict={x:xin})
print(out.shape)
Correct answer by Manish P on July 20, 2021
Instead of using tf.nn.conv2d_transpose
you can use tf.layers.conv2d_transpose
It is a wrapper layer and there is no need to input output shape or if you want to calculate output shape you can use the formula:
H = (H1 - 1)*stride + HF - 2*padding
H - height of output image i.e H = 28
H1 - height of input image i.e H1 = 7
HF - height of filter
Answered by Yash Khare on July 20, 2021
The answers here give figures that work, but they don't mention that there are multiple possible output shapes for the convolution-transpose operation. Indeed, if the output shape was completely determined by the other parameters then there would be no need for it to be specified.
The output size of a convolution operation is
# padding=="SAME"
conv_out = ceil(conv_in/stride)
# padding=="VALID"
conv_out = ceil((conv_in-k+1)/stride)
where conv_in
is the input size and k
is the kernel size. In OP's link these padding methods are called 'half padding' and 'no padding' respectively.
When calling
tf.nn.conv2d_transpose(value, filter, output_shape, strides)
we need the output_shape
parameter to be the shape of a tensor that, if convolved with filter
and strides
, would have produced a tensor of the same shape as value
. Because of rounding, there are multiple such shapes when stride>1
. Specifically, we need
dconv_in-1 <= (dconv_out-k)/s <= dconv_in
==>
(dconv_in-1)s + k <= dconv_out <= (dconv_in)s + k
If dconv_in = 7, k = 4, stride = 3
# with SAME padding
dconv_out = 19 or 20 or 21
# with VALID padding
dconv_out = 22 or 23 or 24
The tf.layers
API automatically calculates an output_shape (which seems to be the smallest possible for VALID padding and the largest possible for SAME padding). This is often convenient, but can also lead to shape mismatches if you are trying to recover the shape of a previously convolved tensor, eg in an autoencoder. For example
import tensorflow as tf
import numpy as np
k=22
cin = tf.placeholder(tf.float32, shape=(None, k+1,k+1,64))
w1 = tf.placeholder(tf.float32, shape=[4,4,64,32])
cout = tf.nn.conv2d(cin, w1, strides=(1,3,3,1), padding="VALID")
f_dict={cin:np.random.rand(1,k+1,k+1,64),
w1:np.random.rand(4,4,64,32)}
dcout1 = tf.nn.conv2d_transpose(cout, w1, strides=(1,3,3,1),
padding="VALID", output_shape=[1,k,k,64])
dcout2 = tf.nn.conv2d_transpose(cout, w1, strides=(1,3,3,1),
padding="VALID", output_shape=[1,k+1,k+1,64])
dcout_layers = tf.layers.conv2d_transpose(cout, 64, 4, 3, padding="VALID")
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
inp_shape = sess.run(cin, feed_dict=f_dict).shape
conv_shape = sess.run(cout, feed_dict=f_dict).shape
lyrs_shape = sess.run(rcout, feed_dict=f_dict).shape
nn_shape1 = sess.run(dcout1, feed_dict=f_dict).shape
nn_shape2 = sess.run(dcout2, feed_dict=f_dict).shape
print("original input shape:", inp_shape)
print("shape after convolution:", conv_shape)
print("recovered output shape using tf.layers:", lyrs_shape)
print("one possible recovered output shape using tf.nn:", nn_shape1)
print("another possible recovered output shape using tf.nn:", nn_shape2)
>>> original input shape: (1, 23, 23, 64)
>>> shape after convolution: (1, 8, 8, 32)
>>> recovered output shape using tf.layers: (1, 22, 22, 64)
>>> one possible recovered output shape using tf.nn: (1, 22, 22, 64)
>>> another possible recovered output shape using tf.nn: (1, 23, 23, 64)
Answered by ludog on July 20, 2021
Take a look at the source code for tf.keras.Conv2DTranspose
, which calls the function deconv_output_length
when calculating its output size. There's a subtle difference between the accepted answer and what you find here:
def deconv_output_length(input_length, filter_size, padding,
output_padding=None, stride=0, dilation=1):
"""Determines output length of a transposed convolution given input length.
Arguments:
input_length: Integer.
filter_size: Integer.
padding: one of `"same"`, `"valid"`, `"full"`.
output_padding: Integer, amount of padding along the output dimension.
Can be set to `None` in which case the output length is inferred.
stride: Integer.
dilation: Integer.
Returns:
The output length (integer).
"""
assert padding in {'same', 'valid', 'full'}
if input_length is None:
return None
# Get the dilated kernel size
filter_size = filter_size + (filter_size - 1) * (dilation - 1)
# Infer length if output padding is None, else compute the exact length
if output_padding is None:
if padding == 'valid':
# note the call to `max` below!
length = input_length * stride + max(filter_size - stride, 0)
elif padding == 'full':
length = input_length * stride - (stride + filter_size - 2)
elif padding == 'same':
length = input_length * stride
else:
if padding == 'same':
pad = filter_size // 2
elif padding == 'valid':
pad = 0
elif padding == 'full':
pad = filter_size - 1
length = ((input_length - 1) * stride + filter_size - 2 * pad +
output_padding)
return length
I added the comment above the call to max
.
The formula for padding == 'valid'
is H = H1 * stride + max(HF - stride, 0)
, which only varies from @Manish P's answer when stride < HF
. This one got me into trouble, so I thought I'd post it here.
Answered by Kyle on July 20, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP