Data Science Asked by MJimitater on March 15, 2021
I’ve created spectrogramms of different classes of low-hertz signals. They all have a plain blue foreground with hardly other coloured pixels, even for me its not easy to distinguish the classes by human sight. Now I’d like to train a CNN to do binary classification on these spectrogramm images. No matter how I build the network with no matter what configurations and parameters – it doesn’t learn, loss doesn’t decrease.
I figured out the reason: the convolutional filters of CNNs are good at distinguishing forms and shapes, but not really plain colours.
How can CNNs learn colours? Am I missing something? Perhaps there are more suitable models than CNNs?
Here is an example of the two classes that hardly differ, only in the colour hue:
2 more images:
EDIT1: The two classes have images that not all look the same with the same prevailing colour; some have yellow/greenish stripes in them, some have lighter or darker colour etc. So the classes do not only consist of the feature colour. My goal is to classify these images using CNNs, but somehow CNNs fail at learning from images, that almost only consist of colour, whereas CNNs learn well if there are sharp edges, boundaries, object-like elements in the image.
CNNs can learn colors, there are CNNs developed just to check colors on cars. But there are some points worth analyzing when dealing with classification when color is more relevant feature than shape.
Digital Color Images are usually represented as a pixel 2D Grid, where each pixel is a vector with 3 or 4 elements (the later case to define opacity). Usually, we deal with images using the RGB Color Space, as this is a representation used by our retinas to color cells.
But there are multiple Color Spaces that can be used to represent images, mostly 3 dimensional as RGB but each one has a different approaches and the transformation between them is not always linear. Check using HSV and HSL to check which is best for your problem. (The previously mentioned paper, has other examples).
There are a bunch CNNs architectures, developed with different intuitions and objectives. For example U-Net is developed with image segmentation in mind, reducing information loss by keeping shallow paths along with the deep paths.
Most architectures are created with shape in mind, trying to catch local nuances in the image, and usually use small kernels such as 3x3, 5x5 etc.
Dealing with colors as the main feature, these local differences might not be the most relevant feature you want your kernels to learn to extract, more global patchs (i.e larger kernels) might be better.
These plots can be seem as surfaces and have some known relations between vertical and horizontal axes, maybe non-symmetric kernels might be more functional, and also avoid the speed loss in using too large kernels.
I would go with layers using alternating sizing. For example 30x3 then 3x30 kernels, with no padding.
Answered by Pedro Henrique Monforte on March 15, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP