Data Science Asked by Eduardo Wada on May 24, 2021
I recently started playing with SVMs for a one class classification, I was able to get some reasonable classifications from real data and but was trying to optimize the nu and gamma parameters when I came across this example:
In the code below, I train an SVM with an array of ones, then I present the same array of ones for classification and it classifies all ones as outliers.
import pandas as pd
from sklearn import svm
import numpy as np
nu = 0.01
gamma = 1
ones = pd.DataFrame(np.ones(100))
clf = svm.OneClassSVM(nu=nu, kernel="rbf", gamma=gamma)
clf.fit(ones)
ones["predicted"] = clf.predict(ones)
#Returns -1 for all entries
My question is: why does this happen? I thought this data would be trivial for any parameter configuration.
What you are facing is a small but crucial definition difference:
novelty detection:
The training data is not polluted by outliers, and we are interested in detecting anomalies in new observations.
outlier detection:
The training data contains outliers, and we need to fit the central mode of the training data, ignoring the deviant observations.
OneClassSVM is an Unsupervised Outlier Detection. Therefor your data needs to have outliers in order for the algortihm to detect them. My best guess, why its prediction every input as an outlier is, that if there are no real outliers, everything must be an outlier.
Let me demonstrate this quickly. I adjustet the kernel to linear
import pandas as pd
from sklearn import svm
import numpy as np
nu = 0.5
gamma = 1.0
ones= pd.DataFrame(np.ones(100))
clf = svm.OneClassSVM(nu=nu, kernel="linear", gamma=gamma)
clf.fit(ones)
clf.predict(-1) # -1
clf.predict(1) # -1
clf.predict(1.00001) # 1
clf.predict(2) # 1
clf.predict(10) # 1
Correct answer by RyanMcFlames on May 24, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP