Preventing fitting Regression CNN to the mean when dataset has only few outliers

Question

I am trying to train a CNN for regression on a dataset where most of the points lie around a similar output value. There are however a few outliers that are very important but they are less represented and the trained network thus tends to predict all output values close to the mean of the whole dataset (underfitting). This leads to a somwhat small error (and good precision) because the vast majority of points lie in that range, but the error is way higher for points even slightly outside the "normal" case.

But since this regressor would be most useful to predict the output for outliers (quality-control use case), it's currently pretty much useless.

Is there a way to prevent this kind of behavior and have a CNN that is giving a greater weight to outliers and extrema, in order to avoid underfitting?

To some extent, the Random Forest method, although much better at predicting the output for outliers, still exhibits a higher error for points in the extrema, while the error around the mean is very small. The "low" points are predicted with too high a value, and the "high" points are predicted with too low a value (each time closer to the mean). So any idea for that case would be great too!

Thanks a lot

user50384 · Answer

I'm not sure what you are trying to do. CNNs are good for image-related tasks, as they attempt to extract spatially local features from the input data. They can be used for regression problems but only as long as the input resembles an image.

Random Forests on the other hand are bad at image-related tasks, unless some sort of feature extraction has been performed beforehand.

Does your dataset consist of images? If not don't use CNNs!

Preventing fitting Regression CNN to the mean when dataset has only few outliers

One Answer

Add your own answers!

Ask a Question