Sklearn LocalOutlierFactor contamination parameter

Question

Can anyone provide an intuitive explanation of the choice of contamination parameter used in sklearn's LocalOutlierFactor implementation when contamination="auto" ?

The sklearn guide suggests "as described in the paper" but I couldn't find anything obvious. Thanks.

Tom M. · Answer

(this answer assumes you were asking about how the offset_ attribute was chosen when contamination="auto")

The only place in the paper that I can conceive of that factor coming from is Section 7.3, where the original authors explored soccer data and say

Below we
  discuss all the local outliers with LOF > 1.5 (see table 3), and explain why they are exceptional.

StevenTheDataGuy · Answer

You are specifying with a floating point number what proportion of the data you are fitting on is an outlier. If you use 'Auto' it will default to 0.1. Note that in the current documentation, there is a changed note specifying that it will default to 0.2 in a future version.

Answered by StevenTheDataGuy on September 4, 2021

Sklearn LocalOutlierFactor contamination parameter

2 Answers

Add your own answers!

Ask a Question