TransWikia.com

Why won't my SVM learn a sequence of repeated elements

Data Science Asked by Eduardo Wada on May 24, 2021

I recently started playing with SVMs for a one class classification, I was able to get some reasonable classifications from real data and but was trying to optimize the nu and gamma parameters when I came across this example:

In the code below, I train an SVM with an array of ones, then I present the same array of ones for classification and it classifies all ones as outliers.

import pandas as pd
from sklearn import svm
import numpy as np

nu = 0.01
gamma = 1
ones = pd.DataFrame(np.ones(100))
clf = svm.OneClassSVM(nu=nu, kernel="rbf", gamma=gamma)
clf.fit(ones)
ones["predicted"] = clf.predict(ones)
#Returns -1 for all entries

My question is: why does this happen? I thought this data would be trivial for any parameter configuration.

One Answer

What you are facing is a small but crucial definition difference:

novelty detection:

The training data is not polluted by outliers, and we are interested in detecting anomalies in new observations.

outlier detection:

The training data contains outliers, and we need to fit the central mode of the training data, ignoring the deviant observations.

OneClassSVM is an Unsupervised Outlier Detection. Therefor your data needs to have outliers in order for the algortihm to detect them. My best guess, why its prediction every input as an outlier is, that if there are no real outliers, everything must be an outlier.

Let me demonstrate this quickly. I adjustet the kernel to linear

import pandas as pd
from sklearn import svm
import numpy as np

nu = 0.5
gamma = 1.0
ones= pd.DataFrame(np.ones(100))

clf = svm.OneClassSVM(nu=nu, kernel="linear", gamma=gamma)
clf.fit(ones)
clf.predict(-1) # -1
clf.predict(1) # -1
clf.predict(1.00001) # 1
clf.predict(2) # 1
clf.predict(10) # 1

Correct answer by RyanMcFlames on May 24, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP