Data Science Asked by Victor Luu on September 5, 2021
Apart from the fact that they are neural networks, which usually is a reason for outperforming other algorithms, is there other reason helping auto-encoders perform well in outlier detection?
I know that autoencoders work by encoding a sample into lower-dimension representation, then decoding the representation to reconstruct the sample. As outliers usually have higher reconstruction, they can be detected. However, this does not convince me why auto-encoders can beat other methods. Is it because the outliers errors are very high, so that they can be spotted easily? If so, then what makes the reconstruction errors so high?
Both intuitive and/or theoretical explanations are welcomed.
It helps to first understand why outlier detection is generally a difficult problem and why other methods struggle with it.
By their very nature outliers are rare and most data we have is heavily imbalanced. Quite likely you might have not enough "positive cases" / outliers to train a model at all.
Autoencoders solve this problem because they do not try to identify outliers per se. As you have described they basically learn to down- and upsample input with a high resolution performance. However when an input is very different from usual inputs the upsampling might result in more errors than usual which then helps us in identifying outliers.
The reconstruction errors are higher because the autoencoder has been trained mostly / almost exclusively on non-outlier data so when they encounter an outlier they cannot deal with it as well.
Imagine an autoencoder trained to downsample pictures of an orange and then upsample it. If we feed the picture of an apple into this autoencoder it will not produce a very accurate result and that helps us identify that the input is actually an outlier.
Answered by Fnguyen on September 5, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP