TransWikia.com

Which Outlier Detection Method? Why?

Data Science Asked by Arkan on February 6, 2021

For detecting an outlier in a vector I have tested different well known outlier detection methods. Finally, I used combination of different methods and an agreement between those methods. Now, a person asks why did you choose this combination and algorithms!? You can reach different combinations and use other algorithms and they may yield better results. What should I answer? I cannot just say based on tests as there are many other algorithms that I haven’t tested (cannot test all algorithms). It is not a logical response, I think.

I’m looking for tests to justify my selected methods and combination and say why I have selected these methods.

Please let me know your suggestions.

2 Answers

You can justify your choices by using data.

Treat the anomaly detection like a supervised learning problem where the concept is being anomaly. Then you'll be able to present - for each method - its confusion matrix. Not only it will be a good justification, it will enable to understand the expected results.

Many times, we have models and we wonder which confidence threshold we should use for alerting. In the supervised learning framework you'll be able to do trade-off like "increasing the confidence to X will lead to a better precision Y yet a decrease of the recall to Z".

Answered by DaL on February 6, 2021

I would add to Dan Levin's answer that when you want to justify a method, the "scientific/engineering way" is to first produce a bibliographical study, where you basically prove that your approach covers an important part of what is commonly known as state-of-the-art methods. I would resume this as follow:

  1. Look for commonly used methods that are known to be efficient for outlier detection.
  2. Summarize their applicability domains (medicine, biology, network security..) and try to link their strong points to your application in order to select some promising methods.
  3. Try the selected methods with usual validation processes inherent to machine learning problems.

Defining what the state-of-the-art consists of a lot of work, is absolutely necessary and is very specific to your application.

Answered by Robin on February 6, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP