TransWikia.com

Adding layer to a trained CNN to process higher resolution images. Tried 2 schemes, 1 works fine, 1 fails completely

Data Science Asked on June 24, 2021

I’m working with images coming from a sensor, for which 1 pixel corresponds to 2 mm in the real world. I’ve built and trained a CNN that does semantic segmentation of the image (128×128 pixels) and it works quite well. The objects that need to be recognized have a specific dimension in mm with little variability. So, if the pixel size is changed, the number of pixels taken on average by an object changes. It is not optimal to use a network trained with 2 mm pixels and feed it a patch of an image with 1 mm pixels.

Now we have a new instrument with higher resolution, 1 mm pixel size (256×256 image size). The higher resolution images are less, the training would be more complicated with many more free parameters. Also I’d like anyway to keep using my CNN that works well. So I’ve looked into transfer learning. However

Scheme 1 (fails) . Add a couple of convolutional layers at 1 mm, a downsample (average pooling) layer to 2 mm that "injects" data directly in the pretrained network (right after what would be the input convolution/activation). Then I freeze all pre-trained layers and let the networks train. It’s so trivial it should work. Instead it doesn’t. I’ve even tried changing the loss and making it so that the output of the first 2 mm convolution should be identical (mean absolute error between the output of the network with the 1 mm input and the network with 2 mm input given the same input image downsampled). By definition it should reach 0 loss in a couple of epochs but it doesn’t! The output gets very similar… but small differences then explode in later layers and the output is nowhere similar to what it should be. Like dice 0 instead of 0.9, and visibly is totally wrong.

Scheme 2 (works well) . My 1 mm input gets directly fed in an average pooling layer and then fed in the 2 mm network. In parallel it goes into a couple of convolutional layers. Then the last layer of the 2mm network (before the 1×1 final convolution) gets upsampled to 1 mm and concatenated to the 1 mm layer. This works perfectly without resorting to any special trick for training.

So… Why does my scheme 1 fail?? Any theoretical issue?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP