What is the best performance metric used in balancing dataset using SMOTE technique

Question

I used smote technique to oversample my dataset and now I have a balanced dataset. The problem I faced is that the performance metrics; precision, recall, f1 measure, accuracy in the imbalanced dataset are better performed than with balanced dataset.

Which measurement can I use to show that balancing dataset may improve performance of the model?

NB: roc_auc_score is better in balanced datset than roc_auc_score with imbalanced dataset
Can it be considered as a good performance measurement?
after the explanation i implemented code and i got this results

import pandas as pd
import numpy as np
from sklearn import preprocessing
import matplotlib.pyplot as plt 
plt.rc("font", size=14)
from sklearn.svm import LinearSVC
from sklearn.svm import SVC
from sklearn.cross_validation import train_test_split,StratifiedShuffleSplit,cross_val_score
import seaborn as sns
from scipy import interp
from time import *
from sklearn import metrics
X=dataCAD.iloc[:,0:71]
y= dataCAD['Cardio1']
# Split the dataset in two equal parts
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=0)
print(y_test.value_counts())
model=SVC(C=0.001, kernel="rbf",gamma=0.01, probability=True)
t0 = time()
clf = model.fit(X_train,y_train)
y_pred = clf.predict(X_test)
t = time() - t0
print("=" * 52)
print("time cost: {}".format(t))
print()
print("confusion matrixn", metrics.confusion_matrix( y_test, y_pred))
cf=metrics.confusion_matrix(y_test, y_pred)
accuracy=(cf.item((0,0))/50)+(cf.item((1,1))/14)
print("model accuracy n",accuracy/2)
print()
print("ttprecision_score: {}".format(metrics.precision_score( y_test, y_pred, average='macro')))
print()
print("ttrecall_score: {}".format(metrics.recall_score(y_test, y_pred, average='macro')))
print()
print("ttf1_score: {}".format(metrics.f1_score(y_test, y_pred, average='macro')))
print()
print("ttroc_auc_score: {}".format(metrics.roc_auc_score( y_test, y_pred, average='macro')))

Results:

Name: Cardio1, dtype: int64
====================================================
time cost: 0.012008905410766602

confusion matrix
 [[50  0]
 [14  0]]
model accuracy 
 0.5

precision_score: 0.390625

recall_score: 0.5

f1_score: 0.43859649122807015

roc_auc_score: 0.5

For balanced dataset

X_train1,y_train1 = sm.fit_sample(X_train, y_train.ravel())
df= pd.DataFrame({'Cardio1': y_train1})
df.groupby('Cardio1').Cardio1.count().plot.bar(ylim=0)
plt.show()
print(X_train1.shape)
print(y_train1.shape)
#model=SVC(C=0.001, kernel="rbf",gamma=0.01, probability=True)
model=SVC(C=10, kernel="sigmoid",gamma=0.001, probability=True)
t0 = time()
clf = model.fit(X_train1,y_train1)
y_pred = clf.predict(X_test)
t = time() - t0
print("=" * 52)
print("time cost: {}".format(t))
print()
print("confusion matrixn", metrics.confusion_matrix(y_test, y_pred))
cf=metrics.confusion_matrix(y_test, y_pred)
accuracy=(cf.item((0,0))/50)+(cf.item((1,1))/14)
print("model accuracy n",accuracy/2)
print()
#print("ttaccuracy: {}".format(metrics.accuracy_score( y_test, y_pred)))
print()
print("ttprecision_score: {}".format(metrics.precision_score( y_test, y_pred, average='macro')))
print()
print("ttrecall_score: {}".format(metrics.recall_score(y_test, y_pred, average='macro')))
print()
print("ttf1_score: {}".format(metrics.f1_score(y_test, y_pred, average='macro')))
print()
print("ttroc_auc_score: {}".format(metrics.roc_auc_score( y_test, y_pred, average='macro')))

Results:

(246, 71)
(246,)
====================================================
time cost: 0.05353999137878418

confusion matrix
 [[ 0 50]
 [ 0 14]]
model accuracy 
 0.5

precision_score: 0.109375

recall_score: 0.5

f1_score: 0.1794871794871795

roc_auc_score: 0.5

I found  no efficient results. Should i implement the model using cross validation?

Djib2011 · Accepted Answer

First of all, just to be clear, you shouldn't evaluate the performance of your models on the balanced data set. What you should do is to split your dataset into a train and a test set with ideally the same degree of imbalance. The evaluation should be performed exclusively on the test set, while the balancing on the training set.

As for your question, any macro averaged metric should do just fine for proving that your balancing technique is effective. To calculate such a metric (let's say accuracy for simplicity), you just need to compute the accuracies of each class individually and then average them.

Example:
We trained two models m1 and m2, the first without balancing the dataset and the second after using SMOTE to balance the dataset.

Actual values :  0, 0, 0, 0, 0, 0, 0, 0, 1, 1
Predicted m1: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0  <-- only predicts majority class
Predicted m2: 1, 0, 0, 1, 0, 1, 0, 0, 1, 1

How would we normally calculate accuracy?

$acc = frac{correct , predictions}{total , predictions}$

How do our two models perform on this metric?

$acc_1 = frac{8}{10} = 80%$
$acc_2 = frac{7}{10} = 70%$

According to this performance metric, m2 is better than m1. However, this isn't necessarily the case as m1 just predicts the majority class! In order to show how m2 is better than m1, we need a metric that treats the two clases as equals.

We' ll now try to calculate a macro-averaged accuracy. How? First we'll calculate the accuracy for each class separately, and then we'll average them:

For m1:
$acc_1^0 = frac{8}{8} = 100%$ <-- m1's accuracy on class 0
$acc_1^1 = frac{0}{2} = 0%$  <-- m1's accuracy on class 1
$macro_acc_1 = frac{acc_1^0 + acc_1^1}{2} = frac{100% + 0%}{2} = 50%$
For m2:
$acc_2^0 = frac{5}{8} = 62.5%$ <-- m2's accuracy on class 0
$acc_2^1 = frac{2}{2} = 100%$  <-- m2's accuracy on class 1
$macro_acc_2 = frac{acc_2^0 + acc_2^1}{2} = frac{62.5% + 100%}{2} = 81.25%$

Notes:

Macro averaging can be applied to any metric you want, however it is most common in confusion matrix metrics (e.g precision, recall, f1).
You don't need to implement this by yourself, many libraries already have it (e.g. sklearn's f1_score has a parameter called average, which can be set to "macro")

What is the best performance metric used in balancing dataset using SMOTE technique

One Answer

Add your own answers!

Ask a Question