Data Science Asked on May 1, 2021
I wanted a metric where I could weigh each class as I wish while measuring "total accuracy". sklearn
seems to have this with balanced_accuracy_score
. Irrespective of the sample_weight, I am getting the same "balanced accuracy". Why? what was the point of sample_weights
?
from sklearn.metrics import balanced_accuracy_score
sample_weight = np.array([1 if i == 0 else 1000 for i in y])
balanced_accuracy_score(y,m.predict(xs),sample_weight=sample_weight)
Here are the docs.
The point of sample_weights
is to give weights to specific sample (e.g. by their importance or certainty); not to specific classes.
Apparently, the "balanced accuracy" is (from the user guide):
the macro-average of recall scores per class
So, since the score is averaged across classes - only the weights within class matters, not between classes... and your weights are the same within class, and change only across classes.
Explicitly (from the user guide again):
$$hat{w}_i = frac{w_i}{sum_j{1(y_j = y_i) w_j}}$$
i.e. the i-th sample is re-weighted by dividing its weight by the total weights of samples with the same label.
Now, if you want, you can just use the simple accuracy score, and plug in weights as you see fit.
In the following example:
from sklearn.metrics import balanced_accuracy_score, accuracy_score
y_true = [0, 1, 0, 0, 1, 0, 1, 1, 1, 1]
y_pred = [0, 1, 0, 0, 0, 1, 1, 1, 1, 1]
some_sample_weights =[10, 1, 1, 1, 10, 1, 0.5, 0.5, 0.5, 0.5]
weights_by_class =[1 if y==1 else 1000 for y in y_true]
print('with some weights: {:.2f}'.format(balanced_accuracy_score(y_true, y_pred, sample_weight=some_sample_weights)))
print('without weights: {:.2f}'.format(balanced_accuracy_score(y_true, y_pred)))
print('with class weights in balanced accuracy score: {:.2f}'.format(balanced_accuracy_score(y_true, y_pred, sample_weight=weights_by_class)))
print('with class weights in accuracy score: {:.5f}'.format(accuracy_score(y_true, y_pred, sample_weight=weights_by_class)))
class_sizes = [sum((1 for y in y_true if y==x))/len(y_true) for x in (0,1)]
weights_by_class_manually_balanced = [w/class_sizes[y] for w, y in zip(weights_by_class, y_true)]
print('with class weights in accuracy score (manually balanced): {:.5f}'.format(accuracy_score(y_true, y_pred, sample_weight=weights_by_class_manually_balanced)))
you get:
with some weights: 0.58
without weights: 0.79
with class weights in balanced accuracy score: 0.79
with class weights in accuracy score: 0.75012
with class weights in accuracy score (manually balanced): 0.75008
As you can see:
0
labels are correctly classified), and re-adjusting the weights according to class sizes doesn't matter much (the accuracy is a bit less, because the 0
class is larger)Answered by Itamar Mushkin on May 1, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP