TransWikia.com

Ways to increase recall in SVM

Data Science Asked by Tanmey Rawal on December 9, 2020

I am training an SVM on UCI’s Bank Marketing Data Set, the bank additional-full.csv. As the data is skewed I am also interested in recall. I am getting accuracy of about 87.95% but my recall is around 51%. I want to know ways to increase recall without decreasing accuracy so much using SVM only.

My code:

from sklearn.svm import SVC

svm_clf = SVC(gamma="auto",class_weight={1: 2.6})
svm_clf.fit(X_transformed, y_train_binary.ravel())

Additional info:

I have not created any new feature (i.e combining features) and considered unknown as label.
I have also removed Duration attribute as suggested by attribute information

I have tried different class_weights, so I can increase recall upto 75.32% but then my accuracy drops to 68%
How can I increase recall in SVM models without decreasing accuracy so much?

One Answer

Duplicating is RandomOverSampling, not help much in OverSampling.

I quickly did a RandomUnderSampling.The score looks good for a baseline to improve.
I have not done anything for the model improvement


Code as it is from my Google Colab -

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip'

from urllib.request import urlretrieve
urlretrieve(url, "data.zip")

from zipfile import ZipFile
file_name = "/content/data.zip"
with ZipFile(file_name, 'r') as zip: 
    zip.extractall()


import numpy as np,pandas as pd
data = pd.read_csv("/content/bank-additional/bank-additional-full.csv",delimiter=";")
X = data.iloc[:,:-1]
y = data.iloc[:,-1]
y.value_counts()

X_cat = X.select_dtypes(include='object')
from sklearn.preprocessing import LabelEncoder
lbe = LabelEncoder()
for colname in X_cat.columns:
    X_cat[colname] = lbe.fit_transform(X_cat[colname])
    X[colname] = X_cat[colname]
y = lbe.fit_transform(y)    

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X,y,test_size=0.20,random_state=201, stratify=y)


from imblearn.under_sampling import RandomUnderSampler
rand = RandomUnderSampler(sampling_strategy=.6)
x_train, y_train = rand.fit_resample(x_train, y_train)


from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=200,max_samples=0.05)
model.fit(x_train, y_train)
from sklearn.metrics import accuracy_score
y_pred_train = model.predict(x_train)

####Metrics on train
from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(y_train, y_pred_train).ravel() # (55148, 1715, 8,90)
print("Training",fp/(tn+fp),fn/(fn+tp), accuracy_score(y_train, y_pred_train), tn, fp, fn, tp)

####Metrics on test
y_pred = model.predict(x_test)
tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel() # (55148, 1715, 8,90)
print("Test",fp/(tn+fp),fn/(fn+tp), accuracy_score(y_test, y_pred), tn, fp, fn, tp)

from sklearn.metrics import recall_score
print("Test-recall",recall_score(y_test, y_pred))

Next -
- Try SMOTE and Combo of Over and Under
- Work on Feature Engg and Dimensionality reduction
- Check other models

Answered by 10xAI on December 9, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP