Data Science Asked by sandyp on September 4, 2021
Can anyone provide an intuitive explanation of the choice of contamination
parameter used in sklearn’s LocalOutlierFactor implementation when contamination="auto"
?
The sklearn guide suggests “as described in the paper” but I couldn’t find anything obvious. Thanks.
(this answer assumes you were asking about how the offset_
attribute was chosen when contamination="auto"
)
The only place in the paper that I can conceive of that factor coming from is Section 7.3, where the original authors explored soccer data and say
Below we discuss all the local outliers with LOF > 1.5 (see table 3), and explain why they are exceptional.
Answered by Tom M. on September 4, 2021
You are specifying with a floating point number what proportion of the data you are fitting on is an outlier. If you use 'Auto' it will default to 0.1. Note that in the current documentation, there is a changed note specifying that it will default to 0.2 in a future version.
Answered by StevenTheDataGuy on September 4, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP