Why won't my SVM learn a sequence of repeated elements

Question

I recently started playing with SVMs for a one class classification, I was able to get some reasonable classifications from real data and but was trying to optimize the nu and gamma parameters when I came across this example:

In the code below, I train an SVM with an array of ones, then I present the same array of ones for classification and it classifies all ones as outliers.

import pandas as pd
from sklearn import svm
import numpy as np

nu = 0.01
gamma = 1
ones = pd.DataFrame(np.ones(100))
clf = svm.OneClassSVM(nu=nu, kernel="rbf", gamma=gamma)
clf.fit(ones)
ones["predicted"] = clf.predict(ones)
#Returns -1 for all entries

My question is: why does this happen? I thought this data would be trivial for any parameter configuration.

RyanMcFlames · Accepted Answer

What you are facing is a small but crucial definition difference:
novelty detection:
The training data is not polluted by outliers, and we are interested in detecting anomalies in new observations.
outlier detection:
The training data contains outliers, and we need to fit the central mode of the training data, ignoring the deviant observations.
OneClassSVM is an Unsupervised Outlier Detection. Therefor your data needs to have outliers in order for the algortihm to detect them. My best guess, why its prediction every input as an outlier is, that if there are no real outliers, everything must be an outlier.
Let me demonstrate this quickly. I adjustet the kernel to linear
import pandas as pd
from sklearn import svm
import numpy as np

nu = 0.5
gamma = 1.0
ones= pd.DataFrame(np.ones(100))

clf = svm.OneClassSVM(nu=nu, kernel="linear", gamma=gamma)
clf.fit(ones)
clf.predict(-1) # -1
clf.predict(1) # -1
clf.predict(1.00001) # 1
clf.predict(2) # 1
clf.predict(10) # 1

Why won't my SVM learn a sequence of repeated elements

One Answer

Add your own answers!

Ask a Question