TransWikia.com

Implementing U-Net segmentation model without padding

Data Science Asked by TomSelleck on April 13, 2021

I’m trying to implement the U-Net CNN as per the published paper here.

I’ve followed the paper architecture as closely as possible but I’m hitting an error when trying to carry out the first concatenation:

enter image description here

From the diagram, it appears the 8th Conv2D should be merged with result of the 1st UpSampling2D operation, however the Concatenate() operation throws an exception that the shapes don’t match:

def model(image_size = (572, 572) + (1,)):

    # Input / Output layers
    input_layer = Input(shape=(image_size), 32)

    """ Begin Downsampling """

    # Block 1
    conv_1 = Conv2D(64, 3, activation = 'relu')(input_layer)
    conv_2 = Conv2D(64, 3, activation = 'relu')(conv_1)

    max_pool_1 = MaxPool2D(strides=2)(conv_2)

    # Block 2
    conv_3 = Conv2D(128, 3, activation = 'relu')(max_pool_1)
    conv_4 = Conv2D(128, 3, activation = 'relu')(conv_3)

    max_pool_2 = MaxPool2D(strides=2)(conv_4)

    # Block 3
    conv_5 = Conv2D(256, 3, activation = 'relu')(max_pool_2)
    conv_6 = Conv2D(256, 3, activation = 'relu')(conv_5)

    max_pool_3 = MaxPool2D(strides=2)(conv_6)

    # Block 4
    conv_7 = Conv2D(512, 3, activation = 'relu')(max_pool_3)
    conv_8 = Conv2D(512, 3, activation = 'relu')(conv_7)

    max_pool_4 = MaxPool2D(strides=2)(conv_8)

    """ Begin Upsampling """

    # Block 5
    conv_9 = Conv2D(1024, 3, activation = 'relu')(max_pool_4)
    conv_10 = Conv2D(1024, 3, activation = 'relu')(conv_9)

    upsample_1 = UpSampling2D()(conv_10)

    # Connect layers
    merge_1 = Concatenate()([conv_8, upsample_1])

Error:

Exception has occurred: ValueError
A `Concatenate` layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(32, 64, 64, 512), (32, 56, 56, 1024)]

Note that the values 64 and 56 correctly line up with the architecture.

enter image description here

I don’t understand how to implement the model as it is in the paper. If I change my code to accept an image of shape (256, 256) and add padding='same' to the Conv2D layers, the code works as the sizes are aligned.

This seems to go against what the authors specifically state in their implementation:

enter image description here

Could somebody point me in the right direction on the correct implementation of this model?

One Answer

$hspace{3cm}$enter image description here

If we follow the definition of each arrow.

Gray => Copy and Crop

Every step in the expansive path consists of an upsampling of the feature map followed by a 2x2 convolution (“up-convolution”) that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3x3 convolutions, each followed by a ReLU. The cropping is necessary due to the loss of border pixels in every convolution. Paper

So, believe(I have added 3 coloured circles)

  • Blue - 28x28 is upsampled and become 56x56, 1024 is halved to 512
  • Red - 64x64 is cropped to 56x56. Then Concatenated along FM axis.
  • Black - 3x3 convolutions, followed by a ReLU

Correct answer by 10xAI on April 13, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP