Data Science Asked on July 29, 2020
I am trying to reconstruct an image from a dense layer with is a concatenation of outputs from a
1) convolutional network with image inputs; and
2) dense layer with numerical inputs
The concatenated 1D tensor is fed to a dense layer which I need to reconstruct as an image.
The code I am using right now is as so:
merge_output = tf.keras.layers.concatenate([convolutional_model_output, numerical_model_output])
densem1 = Dense(8092, activation='relu')(merge_output)
#densem2 = Dense(512, activation='relu')(densem1)
densem2 = Dense(self.image_width*self.image_height*3, activation='relu')(densem1)
reshapem1 = Reshape(target_shape=(self.image_height, self.image_width, 3))(densem2)
convm1 = Conv2D(filters=32, kernel_size=3, padding="same" , activation='relu')(reshapem1)
convm2 = Conv2D(filters=3, kernel_size=3, padding="same" , activation='relu')(convm1)
However, the training is not able to converge to the true output and looks like a kaleidoscope of weird colors.
Is there something wrong with my approach? I do realize that concatenating convolutional and numerical features and reshaping them as an image might be a possible pain point, is there a better way to represent the layer or the specific problem?
Thanks in advance.
Concatenating the two output layers is not such a pain as you might think. For starters, there are a number of things that can possibly go wrong with the model. Let's look at them one by one:
1) If the loss curve is unstable or is not coverging, try using batchnorm in the intermediate dense layers. Since there a lot of dense layers, it can lead to overfitting as well so try using dropout too. Experiment with batchnorm and dropout separately as well as together to see which one helps.
2) Check the range of the input and output of your decoder network: Normalise the data to be within 0 and 1 by simply multiplying the RGB values with 255. Use this normalised value as the true target value if your output image is same as input image or else use the normalised target image as your target output. Similarly, to obtain the output value between 0 and 1, simply clip the output from relu to 1 and store it in a float variable as its a fraction. Use this float variable for computing the loss. Multiply it with 255 and store it in integer variable for visualisation purpose.
If these don't work you might want to look at other upsampling methods out there like: Deconvolution layers and Pixel Shuffle layers. You might also have a look at the decoder of generative models like GANs, specifically the CGAN (conditional GAN) which do a similar thing of merging information from an image with some known prior.
Answered by user1825567 on July 29, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP